Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MEG regexps and testsuite to comply with father level specification #789

Merged
merged 7 commits into from
Jun 25, 2019

Conversation

sappelhoff
Copy link
Member

@sappelhoff sappelhoff commented Jun 21, 2019

EDIT --> I now take over all MEG data formats, not just KIT


In this PR I add to the MEG regexp so that _markers.<sqd, mrk>) can be accepted, as well as general MEG data files in the KIT format ending either with .sqd or .con. see the specification

This is BIDS since these PRs:

cc @monkeyman192

I furthermore update the yarn.lock file to include the hed validator, which has recently been added (but no to the lock file yet)

This is based on a use case, where the validator throws an error for a correct KIT BIDS dataset, see: mne-tools/mne-bids#209 (comment)

Interesting observations (i.e., I don't know what's happening and why)

--> when I added my addition: "\\.(?:sqd|con)" to the END of the line ... it did not work (i.e., did not resolve the use case error) ... however, adding it to the MIDDLE, or the START of the line did work - without breaking any tests.

  • works: ["\\.(?:sqd|con)", "\\.fif", "\\.ds"]
  • works: ["\\.fif", "\\.(?:sqd|con)", "\\.ds"]
  • does NOT work: ["\\.fif", "\\.ds", "\\.(?:sqd|con)"]

Can somebody shed some light on this?

@codecov
Copy link

codecov bot commented Jun 21, 2019

Codecov Report

Merging #789 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #789   +/-   ##
=======================================
  Coverage   89.02%   89.02%           
=======================================
  Files          79       79           
  Lines        2296     2296           
  Branches      422      422           
=======================================
  Hits         2044     2044           
  Misses        216      216           
  Partials       36       36

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3c6e46...561fb2c. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 21, 2019

Codecov Report

Merging #789 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #789   +/-   ##
=======================================
  Coverage   89.02%   89.02%           
=======================================
  Files          79       79           
  Lines        2296     2296           
  Branches      422      422           
=======================================
  Hits         2044     2044           
  Misses        216      216           
  Partials       36       36

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3c6e46...8cd2462. Read the comment docs.

@jasmainak
Copy link
Collaborator

Yet, I cannot find any commit leading up to 1.2.4 that should have changed the way the validator looks at MEG files ...

you can try "git bisect" on a local version of the validator to figure out what leads to the problem

@jasmainak
Copy link
Collaborator

Can somebody shed some light on this?

I usually use https://regex101.com/ for these kinds of things ... :-)

Copy link
Collaborator

@jasmainak jasmainak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my usual comment. Can you add a test? ;-)

@sappelhoff
Copy link
Member Author

I usually use https://regex101.com/ for these kinds of things ... :-)

me too ... but for that specific case, it's a JS array with several strings that just get inserted into the regexp --> so the order of items in this array should not make a difference

@sappelhoff
Copy link
Member Author

you can try "git bisect" on a local version of the validator to figure out what leads to the problem

git bisect seems super cool! ... I couldn't use it properly though, for the following puzzling reason:

  1. Using the LOCAL validator version ... the use case of test_kit ALWAYS fails (regardless of version)
    • note, I was using git checkout 1.2.3 or git checkout 1.2.4
  2. However, using the GLOBAL (npm i -g bids-validator@1.2.4) validator version, test_kit fails only for version 1.2.4, but not when doing npm i -g bids-validator@1.2.4 right before ...

@jasmainak
Copy link
Collaborator

sounds like some packaging issue, but I'm not a javascript expert ...

'/sub-01/ses-001/meg/sub-01_ses-001_markers.sqd', // KIT with removed father level directory
'/sub-01/ses-001/meg/sub-01_ses-001_markers.mrk', // KIT with removed father level directory
'/sub-01/ses-001/meg/sub-01_ses-001_meg.sqd', // KIT with removed father level directory
'/sub-01/ses-001/meg/sub-01_ses-001_meg.con', // KIT with removed father level directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example to badFilenames? E.g., something with a father level directory should fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh currently the validator provides a wildcard for everything that is within a father level directory, see: https://github.com/bids-standard/bids-validator/blob/e3c6e4635f077d4b01e7c13baee4a0cf44199c69/bids-validator/bids_validator/rules/file_level_rules.json#L201

the final part is the relevant one: ?(_meg(@@@_meg_type_@@@\\/.*|\\/.*)|

  • it will try to match either an meg_type
    • these are defined below: "@@@_meg_type_@@@": ["\\.fif", "\\.ds"],
  • or it will match a father level directory with arbitrary contents: |\\/.*

for a test that you suggest I would have to touch this logic ... which could have unforeseen consequences.

Two options:

  1. make an issue and let some regex/JS expert deal with this
  2. I try to do it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say if the tests pass, you should be fairly confident of your change. Any unforeseen consequences can be dealt with in the future by adding more tests (and thus increasing your confidence even more in subsequent changes!)

What file formats still allow father level directories? It's only CTF with the *.ds format, right? There are a bunch of tests which need to be updated to reflect this, no? I'll comment on the relevant part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or it will match a father level directory with arbitrary contents:

I seem to remember there was a reason for allowing arbitrary contents. See the discussion in my old PR here: https://github.com/bids-standard/bids-validator/pull/499

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEG file formats with father level according to the latest spec:

  • CTF
  • BTi/4D (why does this one have two names, BTW?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial company in San Diego that commercially developed MEG was called "BTi", which is probably an abbreviation . Later they were renamed into "4-D Neuroimaging", which is usually abbreviated (by me) as "4D".

@@ -176,6 +176,10 @@ describe('utils.type.file.isMEG', function() {
'/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_meg/sub-01_ses-001_task-rest_run-01_meg.raw.mhd',
'/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_meg/xyz', // for e.g., BTi files
'/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_meg/sub-01_ses-001_markers.sqd',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still a good file name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that

/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_meg/abc.raw
/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_meg/abc.raw.mhd
/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-02_meg/def.sqd
/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-03_meg/ghi

are all good names for directory+file. The reason for the father directory sub-01_ses-001_task-rest_run-01_meg is to not require changes to the actual file names. So the files underneath can be called whatever, as long as those are known MEG files and can be processed in the software of the original MEG system manufacturer (i.e. BTi, KIT, ITAB).

It would be good to determine the allowed file names and/or extensions for the different systems. I suspect that https://github.com/fieldtrip/fieldtrip/blob/master/fileio/ft_filetype.m has quite a few of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @robertoostenveld :

So the files underneath can be called whatever, as long as those are known MEG files and can be processed in the software of the original MEG system manufacturer [emphasis added by @sappelhoff]

--> I think what @jasmainak wants is that we add tests for exactly that: I.e., do NOT accept whatever file underneath ... but instead only accept those files underneath from which we know that they are correctly there. E.g., CTF and BTi/4D (these are the only MEG data formats that have a father format, see my comment)

I think I generally agree with what @jasmainak says, but I could also imagine this in a separate PR, because it goes beyond the scope of this PR, which was the following:

  • Allow that KIT data is specified without father level, because it does not need one anymore (according to latest spec)
  • Add tests that pass with KIT data specified without father level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that we should keep the PR focussed on one aspect, which is KIT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do it in another PR that's fine but my comment above was for the KIT .sqd file, not the others :) Let's raise an issue so we don't forget.

@sappelhoff sappelhoff changed the title Update MEG regexp Update MEG regexp to allow KIT data format without father-level Jun 24, 2019
@sappelhoff sappelhoff changed the title Update MEG regexp to allow KIT data format without father-level Update MEG regexps and testsuite to comply with father level specification Jun 24, 2019
@sappelhoff
Copy link
Member Author

@robertoostenveld @jasmainak

I ended up revamping the test suite for all MEG data. Please check if you agree with this.

Father levels are only present for

  • BTI
  • CTF

The other 4 file formats do NOT have a father level anymore:

  • FIF
  • KRISS
  • KIT
  • ITAB

@jasmainak
Copy link
Collaborator

I did not check if the file formats with or without father level are correct or not. But the tests look good to me

'/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_channels.tsv',
// Father directory files are fine for some file formats:
// Father dir: CTF data with a .ds ... the contents within .ds are not checked
'/sub-01/ses-001/meg/sub-01_ses-001_task-rest_run-01_meg.ds/catch-alp-good-f.meg4',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the contents within the CTF .ds must be checked, otherwise it is not a valid dataset and standard software (e..g that of CTF itself) will not be able to read it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but this is not something I break in this PR ... it's never been there :-)

Also, I have opened an issue on this several months ago: https://github.com/bids-standard/bids-validator/issues/692

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@sappelhoff
Copy link
Member Author

merging after approval by Mainak and Robert. Thanks for the reviews!

@sappelhoff sappelhoff merged commit 731b8ae into bids-standard:master Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants