read_raw_bids() cannot infer recording kind, filename extension, leading to awkward workarounds #408

hoechenberger · 2020-05-13T11:36:22Z

Describe the problem

`make_bids_folders()` accepts a `kind` kwarg:

Lines 200 to 201 in 1ce1717

    
           def make_bids_folders(subject, session=None, kind=None, bids_root=None, 
        
                                 make_dir=True, overwrite=False, verbose=False):

`write_raw_bids()`infers the `kind` from the `Raw` object:

mne-bids/mne_bids/write.py

Line 1152 in 1ce1717

kind = _handle_kind(raw)

`write_raw_bids()` infers the filename extension from the `Raw` object:

mne-bids/mne_bids/write.py

Lines 1132 to 1138 in 1ce1717

    
           raw_fname = raw.filenames[0] 
        
           if '.ds' in op.dirname(raw.filenames[0]): 
        
               raw_fname = op.dirname(raw.filenames[0]) 
        
           # point to file containing header info for multifile systems 
        
           raw_fname = raw_fname.replace('.eeg', '.vhdr') 
        
           raw_fname = raw_fname.replace('.fdt', '.set') 
        
           _, ext = _parse_ext(raw_fname, verbose=verbose)

`read_raw_bids()`, in contrast, requires the "full name" of the input file

mne-bids/mne_bids/read.py

Lines 331 to 341 in 1ce1717

    
           def read_raw_bids(bids_fname, bids_root, extra_params=None, 
        
                             verbose=True): 
        
               """Read BIDS compatible data. 
        
               Will attempt to read associated events.tsv and channels.tsv files to 
        
               populate the returned raw object with raw.annotations and raw.info['bads']. 
        
               Parameters 
        
               ---------- 
        
               bids_fname : str 
        
                   Full name of the data file

This makes using `read_raw_bids()` awkward

This is demonstrated in the example gallery:

mne-bids/examples/convert_mne_sample.py

Lines 78 to 79 in 1ce1717

    
           bids_fname = bids_basename + '_meg.fif' 
        
           raw = read_raw_bids(bids_fname, output_path)

Since read_raw_bids() doesn't know about the kind nor extension of a file, both have to be manually appended to the bids_basename. This leads to code that isn't data-format agnostic.

Describe your solution

We could add a kind kwarg to read_raw_bids() to allow specification of the requested recoding modality.
However, this would still leave the problem that we wouldn't know about the correct filename extension to use. Currently our way to automatically find the data file without prior knowledge of the extension in the mne-study-template is to concat bids_basename + '_' + kind, and then glob the (hopefully correctly assembled!) directory, excluding the .json sidecar files. This doesn't seem like a good solution.
The approach taken in the example gallery quoted above -- appending a constant string to the filename -- is not portable.If I were to convert data from FIFF to BranVision, I would expect all my mne-bids-based pipelines to continue working without any modifications.

Additional context

I think this touches the discussion we've had in #407 (comment); namely, that it would not only be nice, but it seems increasingly necessary to have a central registry of all the files in a dataset. In the concrete example, I should be able to tell read_raw_bids(): Please load raw data from: sub-01, ses-test, meg and shouldn't be bothered with the filename extension or folder hierarchy.

The text was updated successfully, but these errors were encountered:

agramfort · 2020-05-13T11:41:24Z

we should have a video session to discuss this but my first reaction is that read_raw_bids should just work with basename and use glob internally to find a matching extension for raw data.

…

hoechenberger · 2020-05-13T11:44:23Z

we should have a video session to discuss this but my first reaction is that

My schedule is flexible. Let me know when you would be available – who else would like to chime in? @jasmainak @sappelhoff (just randomly pinging people here :D)

I can also set up a Doodle to find a date that would suit us all.

Thanks!

sappelhoff · 2020-05-13T12:58:26Z

I don't have a particular opinion about this but I am happy to listen and chime in when I think I have something important to say. My schedule is also flexible (except mondays)

jasmainak · 2020-05-13T20:32:02Z

I would suggest going a little back and forth on github first so we can think about the problem first. I like the idea of globbing.

But one problem with globbing is that you might have two files which match the basename. For example, if someone placed a derivative file in the same folder that has the same basename, then how do you choose the right file? It would be nice if there are guarantees in the validator from this happening. Could you check?

agramfort · 2020-05-13T20:36:08Z

But one problem with globbing is that you might have two files which match the basename. For example, if someone placed a derivative file in the same folder that has the same basename, then how do you choose the right file? It would be nice if there are guarantees in the validator from this happening. Could you check?

would it valid to do this? if not I would say we detect this and raise an exception

jasmainak · 2020-05-13T20:42:20Z

I think nothing stops you from adding non-compliant extra files in the BIDS folders. But I seem to recall that the validator throws warnings/errors if you do that. Maybe @sappelhoff will know. I'm fine with detecting this case and raising an exception until there is a compelling reason to do otherwise.

sappelhoff · 2020-05-14T07:18:48Z

I think nothing stops you from adding non-compliant extra files in the BIDS folders

The validator should (and to my knowledge DOES) stop you if you do that. However:

we know that the validator is not as perfect as we'd like
people can add a .bidsignore to cheat the validator and still include all files they want to

That said, the validator does not go into the /derivatives folder of a BIDS dataset, because we do not yet impose any structure on derivatives. So whatever happens in /derivatives is out of our control. -> globbing seems to be not very robust to me, but would probably work for most cases.

Perhaps we can use a check whether globbing is ambiguous (several matches returned) ... and if it is, we warn and demand extra params ... or a check by the user of their BIDS data?

if it's not ambiguous (just a single match), we just proceed?

hoechenberger · 2020-05-14T09:40:11Z

@agramfort and I had a brief phone call yesterday to discuss the issue. Here's an idea we came up with:

Scenario

Assume a dataset with simultaneous MEG and EEG recordings. MEG data is in FIFF format, and EEG data is stored as BrainVision.

Current solution

bids_root = '/bids'
bids_basename = make_bids_basename(subject='foo', session='bar', task='baz')

# Load MEG data.
kind = 'meg'
fname_ext = 'fif'
bids_fname = f'{bids_basename}_{kind}.{fname_ext}'
raw_meg = read_raw_bids(bids_fname=bids_fname, bids_root=bids_root)

# Load EEG data.
kind = 'eeg'
fname_ext = 'vhdr'
bids_fname = f'{bids_basename}_{kind}.{fname_ext}'
raw_eeg = read_raw_bids(bids_fname=bids_fname, bids_root=bids_root)

How it could be better

bids_root = '/bids'
bids_basename = make_bids_basename(subject='foo', session='bar', task='baz')

# Load MEG data.
kind = 'meg'
raw_meg = read_raw_bids(bids_basename=bids_basename, bids_root=bids_root, kind=kind)

# Load EEG data.
kind = 'eeg'
raw_eeg = read_raw_bids(bids_basename=bids_basename, bids_root=bids_root, kind=kind)

Notice how

we can simply pass bids_basename and kind directly to read_raw_bids
the filename extension will be automatically inferred (based on kind and / or globbing)

➔ We don't require users to create a bids_fname string manually anymore, leading to less user code that's also more portable (filename-extension agnostic.)

Issues with this approach

It's obviously not backward-compatible, at least once we remove the bids_fname parameter from read_raw_bids().
Since the filename extension isn't specified by the user, there can be ambiguities in edge cases.

@sappelhoff

Perhaps we can use a check whether globbing is ambiguous (several matches returned) ... and if it is, we warn and demand extra params ... or a check by the user of their BIDS data?

if it's not ambiguous (just a single match), we just proceed?

I'd be in favor of raising an exception in an ambiguous situation :)

jasmainak · 2020-05-14T15:38:44Z

people can add a .bidsignore to cheat the validator and still include all files they want to

I think you raise an interesting point though. mne-bids should also be bidsignore-aware when reading the file with globbing.

Assume a dataset with simultaneous MEG and EEG recordings.

This is by the way an unlikely scenario in the sense that there are long discussions going on around this. So it's not fully fleshed out in BIDS yet. There might be other mechanisms to disambiguate in this scenario. This is just to say that the kind argument may not be necessary.

agramfort · 2020-05-14T19:41:36Z

An alternative is to do: bids_basename = make_bids_basename(subject=subject, kind='meg') raw = read_raw_bids(bids_basename=bids_basename, bids_root=bids_root) and then to glob internally in read_raw_bids

…

hoechenberger · 2020-05-15T05:59:45Z

@agramfort

bids_basename = make_bids_basename(subject=subject, kind='meg')
raw = read_raw_bids(bids_basename=bids_basename, bids_root=bids_root)

Yes so the difference between this proposal and the one I had made is whether the kind shall be considered a part of the basename or not.

Briefly skimming over the discussion that @jasmainak linked to, it seems possible that at one point in the future we'll have a piece of metadata in our BIDS datasets that tells us whether multiple modalities were recorded simultaneously, e.g. MEG & EEG: something of the kind "SimultaneousRecording": "EEG, MEG". So considering that, I would say the bids_basename should stay free of the kind, and one should only specify the kind when actually intending to read a certain piece of data. Because then the basename could be easily reused for reading multiple kinds. So pretty much like I proposed above:

bids_basename = make_bids_basename(subject='foo')
raw_meg = read_raw_bids(bids_basename=bids_basename, bids_root=bids_root, kind='meg')
raw_eeg = read_raw_bids(bids_basename=bids_basename, bids_root=bids_root, kind='eeg')

Thoughts?

hoechenberger · 2020-05-15T07:56:00Z

I'm working on this now and will propose a PR shortly.

hoechenberger added enhancement API bids labels May 13, 2020

hoechenberger mentioned this issue May 15, 2020

[MRG] RF: Accept bids_basename in read_raw_bids(), get_matched_empty_room(), get_head_mri_trans(), and drop their bids_fname param #410

Merged

7 tasks

jasmainak closed this as completed in #410 May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_raw_bids() cannot infer recording kind, filename extension, leading to awkward workarounds #408

read_raw_bids() cannot infer recording kind, filename extension, leading to awkward workarounds #408

hoechenberger commented May 13, 2020 •

edited

Loading

agramfort commented May 13, 2020 via email

hoechenberger commented May 13, 2020

sappelhoff commented May 13, 2020

jasmainak commented May 13, 2020

agramfort commented May 13, 2020 via email

jasmainak commented May 13, 2020

sappelhoff commented May 14, 2020

hoechenberger commented May 14, 2020 •

edited

Loading

jasmainak commented May 14, 2020

agramfort commented May 14, 2020 via email

hoechenberger commented May 15, 2020 •

edited

Loading

hoechenberger commented May 15, 2020

read_raw_bids() cannot infer recording kind, filename extension, leading to awkward workarounds #408

read_raw_bids() cannot infer recording kind, filename extension, leading to awkward workarounds #408

Comments

hoechenberger commented May 13, 2020 • edited Loading

Describe the problem

make_bids_folders() accepts a kind kwarg:

write_raw_bids()infers the kind from the Raw object:

write_raw_bids() infers the filename extension from the Raw object:

read_raw_bids(), in contrast, requires the "full name" of the input file

This makes using read_raw_bids() awkward

Describe your solution

Additional context

agramfort commented May 13, 2020 via email

hoechenberger commented May 13, 2020

sappelhoff commented May 13, 2020

jasmainak commented May 13, 2020

agramfort commented May 13, 2020 via email

jasmainak commented May 13, 2020

sappelhoff commented May 14, 2020

hoechenberger commented May 14, 2020 • edited Loading

Scenario

Current solution

How it could be better

Issues with this approach

jasmainak commented May 14, 2020

agramfort commented May 14, 2020 via email

hoechenberger commented May 15, 2020 • edited Loading

hoechenberger commented May 15, 2020

hoechenberger commented May 13, 2020 •

edited

Loading

`make_bids_folders()` accepts a `kind` kwarg:

`write_raw_bids()`infers the `kind` from the `Raw` object:

`write_raw_bids()` infers the filename extension from the `Raw` object:

`read_raw_bids()`, in contrast, requires the "full name" of the input file

This makes using `read_raw_bids()` awkward

hoechenberger commented May 14, 2020 •

edited

Loading

hoechenberger commented May 15, 2020 •

edited

Loading