-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: metadata extraction for superdataset reports results for subdataset #123
Comments
Ok it looks like the issue results from this functionality in the
So according to the code it is possible for the root bids directory to be further down in the tree of the specified dataset for which extraction is to be done. This causes the problems observed above. I'm not sure if we should actually support this use case implicitly, since it causes these problems. IMO the ideal way of running an extractor is on the specified dataset, and if it doesn't contain the required files/metadata, then the extractor returns something like an impossible result and the result handling continues on with whatever is next in line. We might want to support the use case explicitly e.g. with an extraction argument (like Alternatively, there could be a check whether the relative root directory (if found) is actually a subdataset, in which case extraction should not continue. (TODO: if a version of the code remains, the file that is searched for should be changed to Interested in what others would regards as sensible behaviour here. |
FTR, the BIDS specification does support BIDS-compliant directories that are further down in the tree of the BIDS dataset root directory: https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#source-vs-raw-vs-derived-data. So this might be the case for some BIDS datasets in the wild. |
thank you @jsheunis for digging! unrelated to the issue: I would like also to check how much/far we can get with https://github.com/ANCPLabOldenburg/ancp-bids which should be more lightweight, represent/use current bids schema , and be coinstallable in modern system (unlike pybids with upper bound on sqlalchemy ATM) |
as for nested bids datasets -- yeah, we need to see on best way to decouple the notion of BIDS dataset from DataLad dataset, since
|
Thanks for the input
Exactly. Ideally an extractor would be able to figure out what and where to extract automatically, but I think with the combination of (1) datalad dataset nesting and (2) BIDS allowing flexibility in where the dataset directory is located, we cannot leave it up to the extractor to decide. I think this would need some extra user input via extraction parameters.
Good point, I didn't consider this before. What could perhaps be useful here is to look at the updated |
Closed by #124 |
The context
I'm running into a weird issue. I have a superdataset (https://github.com/jsheunis/datalad-catalog-demo-super) which has several subdatasets, including the one at
data/ds001499
which is a BIDS dataset. I am running metadata extraction on the superdataset using multiple extractors.The problem
when I run the
bids_dataset
extractor (fromdatalad-neuroimaging
),meta-extract
goes into the subdataset and extracts BIDS metadata, and then reports that for the superdataset.Here you can see the call and the full debug output:
Command output with level set to debug:
More info:
The superdataset ID and VERSION are shown in the json output:
The datalad ID of the subdataset:
Relevant comments
Comment 1
This same problem occurs when I run
meta-conduct
on the superdatset withtraverser.traverse_sub_datasets=True
(I actually came across the issue the first time when usingmeta-conduct
):Note in the output that there are two extraction results containing BIDS metadata, one for the superdataset and one for the subdataset. Note also that these objects differ in their content, specifically that the superdataset object has field
description
equal tonull
, and the subdataset object has fielddescription
equal to a json string:Comment 2
The problem seems to only occur for
bids_dataset
and not other extractors. I created an analogous test withmetalad_studyminimeta
(i.e. superdataset with no metadata, subdataset with a.studyminimeta.yaml
file). And this only reported that extraction was not possible since there is no required metadata file:Meta-extract output:
Comment 3
The above comment suggests the problem lies in the extractor code. But something that confuses me from the initial
meta-extract
debug logs is when the process dives into the subdatasets:I'm not sure why/how this happens.
The text was updated successfully, but these errors were encountered: