Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutual exclusion and conditional relationships in the schema #620

Closed
tsalo opened this issue Sep 21, 2020 · 4 comments
Closed

Mutual exclusion and conditional relationships in the schema #620

tsalo opened this issue Sep 21, 2020 · 4 comments
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.

Comments

@tsalo
Copy link
Member

tsalo commented Sep 21, 2020

There are a number of cases of mutually exclusive items or conditional relationships between items in the specification, and we'll need to support them in the schema.

One prime example is timing metadata for fMRI. In that situation, there are five metadata fields (RepetitionTime, SliceTiming, AcquisitionDuration, DelayTime, and VolumeTiming) with specific relationships. There are five supported patterns for these five fields.

Another, less idiosyncratic case is entities work for the dual-purpose beh suffixes (physio, stim, beh, and events). In all cases, they have two options for entities: (1) place the files under beh, in which case there are a set of applicable entities, or (2) place the files under the associated imaging data type, in which case they much "match" entities, plus a reduced set of applicable entities.

To handle the latter case, I was thinking that the schema could include a match "entity" (basically a special stand-in for a set of entities that may vary based on the associated imaging data). We would need to protect this term from the main specification, even though it would be almost exclusively internal.

To handle mutual exclusion, I think we could start using nested lists or something, so if we originally had:

- suffixes:
    - stim
    - physio
  extensions:
    - .tsv.gz
    - .json
  entities:
    sub: required
    ses: optional
    task: required
    acq: optional
    run: optional
    recording: optional

we could instead have:

- suffixes:
    - stim
    - physio
  extensions:
    - .tsv.gz
    - .json
  entities:
    - matches: required
      recording: optional
    - sub: required
      ses: optional
      task: required
      acq: optional
      run: optional
      recording: optional

In that case there would be two options for entities for stim/physio data, represented as a list of dictionaries.

An alternative that might be more flexible would be to use keywords like OR, AND, and NOT as keys to dictionaries, like so:

  entities:
    - OR:
        - matches: required
        - sub: required
          ses: optional
          task: required
          acq: optional
          run: optional
    - recording: optional

When applied to the timing information for fMRI, this option would look like:

  metadata:
    - OR:
        - AND:
            RepetitionTime: required
            NOT:
                - AcquisitionDuration
                - VolumeTiming
        - AND:
            SliceTiming: required
            VolumeTiming: required
            NOT:
                - RepetitionTime
                - DelayTime
        - AND:
            AcquisitionDuration: required
            VolumeTiming: required
            NOT:
                - RepetitionTime
                - DelayTime
        - AND:
            RepetitionTime: required
            SliceTiming: required
            NOT:
                - AcquisitionDuration
                - VolumeTiming
        - AND:
            RepetitionTime: required
            DelayTime: required
            NOT:
                - AcquisitionDuration
                - VolumeTiming
    - EchoTime: required
    ...
@tsalo tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Sep 23, 2020
@tsalo
Copy link
Member Author

tsalo commented Sep 29, 2020

We also need to be able to say when the presence of one file requires another file (e.g., func/[x]_events.tsv requires func/[x]_[cbv|bold].nii[.gz]).

@tsalo
Copy link
Member Author

tsalo commented Oct 2, 2020

To consolidate the mechanisms I think we need to represent:

  1. Existence of A requires the existence of B (conditional requirement).
    • For example, if a bottom-level json file exists, so too must the corresponding content (e.g., imaging) file exist.
  2. Existence of A prevents the existence of B (mutual exclusion).
    • For example, if RepetitionTime is provided, then AcquisitionDuration cannot be provided as well.
  3. Rules that define a "match" for associated file types (e.g., events, stim, physio).
  4. Preferably, some formalization of the inheritance principle.
  5. Ability to inherit items (e.g., metadata terms and their requirement levels) across context/layout files.
    • For example, we want to set the massive number of common optional/recommended metadata fields for MRI data in one file, and then inherit that list in other files.
  6. Ability to "unset" or override inherited items.
    • For example, PhaseEncodingDirection is RECOMMENDED for most MRI files, but is REQUIRED for fmap files with the _epi suffix.
  7. Ability to override attributes across classes.
    • For example, BOLD data are arbitrarily scaled by default. However, if the part entity is provided, with a value of phase, then the data may have either arbitrary units or radians. So when we check the Units field of a _bold file, we need to also check against the rules of its entities.
  8. The concept of deprecation.

Also, here's one that I originally had, but I've lost track of what it means or why it's necessary:

  1. The rules applied to A also apply to B, but both can exist.
    • This is only relevant for the way we have things structured now (i.e., composite files with equivalent rules summarized together). Technically, we could avoid this rule with massive duplication, which I think we want to avoid.
    • For example, if T1w exists, T2w can exist too, or vice versa, but the rules that apply to T1w also apply to T2w.

@tsalo
Copy link
Member Author

tsalo commented Oct 6, 2020

Based on this SO post, I think using agreed-upon keywords is the way to go. Thanks to @satra for sharing the post.

@tsalo
Copy link
Member Author

tsalo commented Aug 15, 2022

I believe that our recent work on the schema rules sufficiently covers this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
Development

No branches or pull requests

1 participant