Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Add metadata term files #762

Merged
merged 92 commits into from
Apr 13, 2021

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Mar 17, 2021

References #699.

This is a draft that might be useful for determining how to structure metadata YAML files for the schema.

This approach does not incorporate any interactions between metadata terms, or requirement levels, which are both dependent on the suffix or data type to which the metadata term is being applied. This fits well with #603.

Changes proposed:

  • Take the metadata JSON files from the bids-validator and convert them to YAML format.
  • Add "description" and "key_name" fields to the metadata YAML files.
  • Add functions to render the metadata information in tables within the schema. These functions require the specification text to include the requirement levels.
  • Create files for a number of metadata terms that aren't present in the validator, but are present in specification tables.

TODO:

  • Different units dependent on different data types?
    • Yes
  • Include "unit" for string/integer data types?
    • No
  • Limits, like minimum and maximum values
    • Taken from validator
  • Should we retain references to other info (e.g., lists of valid coordinate spaces for iEEG, EEG, and MEG)?
    • Currently, I have the references still in the files, with the referenced info in "private" yaml files (i.e., files with leading underscore).
  • Links that have to be broken up aren't looking great.
    • This is because the flattening step adds a space between separate lines.
  • Would be great to replace relative link paths with absolute ones.
  • Type determination for objects.
    • additionalProperties defines type.
    • properties should create additional rows. How should individual sub-object-level requirement levels be annotated then?? Take Genetics.Dataset, Genetics.Database, and Genetics.Descriptors as an example.
      • SOLUTION: Instead of trying to expand nested dictionaries into new rows, I've replaced the internal definitions with $refs and added new files for the properties that can be referenced on their own. So now Genetics' properties fields is a dict with three $refs and there are separate files for Genetics.Dataset.yaml etc.
  • Make type determination function recursive to deal with nested objects
    • See SoftwareFilters as an example.
  • Document new structure once [MISC] move schema documentation into the schema directory #740 is merged.
  • Add "relative" and "absolute"(?) path formats for strings?
    • I decided to go with dataset_relative and participant_relative for now.
  • Add "unit" format for strings?
    • With PET now merged in, we have a bunch of new metadata fields that just define units of other fields instead of committing to a standard like we have for most other fields with units. Maybe we could define a "unit" format that will let the validator check for valid units?
  • Identify and incorporate section-specific definition modifications.

@tsalo tsalo mentioned this pull request Mar 17, 2021
@tsalo tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Mar 17, 2021
@effigies
Copy link
Collaborator

Is there an advantage to a single file per term? Grouping terms feels more natural to me.

@tsalo
Copy link
Member Author

tsalo commented Mar 17, 2021

I don't want to group terms into multiple files, since relationships between terms (including their categories) probably vary across the specification, along with any logic that needs to be applied. I figured one file per term would be easier to search and edit than one file for all terms.

@tsalo
Copy link
Member Author

tsalo commented Mar 17, 2021

Based on the current code, the following macro call in the specification produces the table at the bottom. Some are missing descriptions right now and others are just missing completely, but it's still pretty cool!

EDIT: Check out https://bids-specification--762.org.readthedocs.build/en/762/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#scanner-hardware

{{ MACROS___make_metadata_table(
   {
      "Manufacturer": "RECOMMENDED",
      "ManufacturersModelName": "RECOMMENDED",
      "DeviceSerialNumber": "RECOMMENDED",
      "StationName": "RECOMMENDED",
      "SoftwareVersions": "RECOMMENDED",
      "HardcopyDeviceSoftwareVersion": "DEPRECATED",
      "MagneticFieldStrength": "RECOMMENDED, but REQUIRED for Arterial Spin Labeling",
      "ReceiveCoilName": "RECOMMENDED",
      "ReceiveCoilActiveElements": "RECOMMENDED",
      "GradientSetType": "RECOMMENDED",
      "MRTransmitCoilSequence": "RECOMMENDED",
      "MatrixCoilMode": "RECOMMENDED",
      "CoilCombinationMethod": "RECOMMENDED",
   }
) }}

image

Comment on lines +13 to +29
{{ MACROS___make_metadata_table(
{
"Manufacturer": "RECOMMENDED",
"ManufacturersModelName": "RECOMMENDED",
"DeviceSerialNumber": "RECOMMENDED",
"StationName": "RECOMMENDED",
"SoftwareVersions": "RECOMMENDED",
"HardcopyDeviceSoftwareVersion": "DEPRECATED",
"MagneticFieldStrength": "RECOMMENDED, but REQUIRED for Arterial Spin Labeling",
"ReceiveCoilName": "RECOMMENDED",
"ReceiveCoilActiveElements": "RECOMMENDED",
"GradientSetType": "RECOMMENDED",
"MRTransmitCoilSequence": "RECOMMENDED",
"MatrixCoilMode": "RECOMMENDED",
"CoilCombinationMethod": "RECOMMENDED",
}
) }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we add conditional logic to the schema, this would be how we define requirement levels in the specification.

type: array
items:
type: string
minLength: 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just assume that strings have a minimum length of 1? If the field is provided with a string, then I think it should have at least one character to be valid, from a BIDS standpoint. Missing data should be provided with "n/a", so there's not really a case for "" appearing in jsons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed! And then users can still abuse it by providing " ", but it might be sufficient to make them think twice :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same can apply to minItems when the type is array, so I'll remove all cases of minLength: 1 and minItems: 1 from the YAML files.

DOIs SHOULD be expressed as a valid
[URI](/02-common-principles.html#uniform-resource-indicator);
bare DOIs such as `10.0.2.3/dfjj.10` are
[DEPRECATED](/02-common-principles.html#definitions).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started replacing internal relative links (e.g., ../02-common-principles.html#definitions) with absolute versions (/02-common-principles.html#definitions). How does everyone feel about this change? I think it would be good to apply throughout the whole specification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing that? I thought it's an advantage to point to the relative files and have Mkdocs generate the suitable links for us? Pointing to html directly feels error prone but I can't put my finger on it right now 🤔

Copy link
Member Author

@tsalo tsalo Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason is that relative links depend on where the metadata field is rendered, so what might work for one file might not work for another. It should also make it easier to resolve links in external applications (e.g., NIDM-Terms).

EDIT: To clarify- I'm not proposing actual links. Just "absolute" relative to the specification root, if that makes sense. So /02-common-principles.html instead of https://bids-specification.readthedocs.io/en/stable/02-common-principles.html.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 🤔 I couldn't find info on whether that's good or bad: https://www.mkdocs.org/user-guide/writing-your-docs/#internal-links

it's not really mentioned in the docs over there. So maybe we should carefully try it, and if it doesn't break anything it's fine.

Comment on lines +410 to +414
"IntendedFor": (
"OPTIONAL",
"This identifies the MRI or CT scan associated with the electrodes, "
"landmarks, and fiducials.",
)
Copy link
Member Author

@tsalo tsalo Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does everyone think of this approach for extending descriptions for modality- or table-specific purposes? The first entry in the tuple is just the requirement level, like normal, and the second entry is whatever needs to be added to the description to make it work for that modality/table.

This requires that there be a canonical "base" definition of each metadata field, but I think that it's a reasonable requirement.

@tsalo tsalo changed the base branch from master to metadata-schema April 13, 2021 19:24
@tsalo tsalo merged commit acc3b9a into bids-standard:metadata-schema Apr 13, 2021
@tsalo tsalo mentioned this pull request Apr 13, 2021
27 tasks
@tsalo tsalo deleted the metadata-schema branch April 13, 2021 19:38
tsalo added a commit that referenced this pull request Jul 13, 2021
* [SCHEMA] Add metadata term files (#762)

* Draft a handful of metadata term files.

* Add example with specific possible values.

* Match the validator schemas better.

* Fix formatting.

* Fix formatting again!

* Draft semi-functional rendering functions.

* Use unit abbreviations.

* Add more fields.

* Get macro working.

* Add terms from first table.

* Add AnatomicalLandmarkCoordinateSystem.

* fMRI task information table.

* More terms.

* More terms.

* More terms.

* EchoTime and FlipAngle

* Add tables.

* More tables.

* More terms.

* More terms.

* More terms.

* Fix spacing.

* Clean things up.

* More terms.

* More terms!

* More terms.

* Some iEEG terms.

* More iEEG terms.

* Add ASL labeling terms.

* Next batch.

* More terms.

* More terms.

* More terms.

* Fix mistakes.

* Last terms.

* Reference yamls.

* Fix mistakes.

* Change format of coordinate system files.

* Use degree in associated files.

* Some of the missing terms.

* A few more terms.

* Fix typos.

* More terms.

* More terms.

* More terms.

* More terms.

* More terms.

* More terms.

* Last terms.

* Fix link.

* Fix internal links.

* Fix links for real.

* Derivative terms.

* Fix up code link.

* Use backslashes for continued strings.

* Replace $ref with file contents.

Also support plural datatype strings and all manner of newlines in descriptions.

* Fix genetics.

* Describe the structure of metadata YAML files.

* Make metadatatype function recursive.

* Improve search function.

* Start adding PET fields.

* Add some fields.

* More terms.

* More terms.

* More terms.

* Fix mistakes.

* More terms.

* Replace InstitutionDepartmentName with existing InstitutionalDepartmentName.

* More terms.

* More terms.

* More terms.

* More terms.

* More terms.

* More terms.

* More terms.

* Last terms.

* Add unit format for strings.

Unused for now but could be useful later.

* Add dataset_relative and participant_relative string formats.

* Update READMEs.

* Fix formats in README.

* Support table-specific metadata description extensions.

* Employ description extensions with IntendedFor.

* Remove explicit defaults from YAML files.

* Replace Minimum with minimum.

* Replace inclusiveMaximum with maximum.

* Replace implicit links with explicit ones.

* Rename key_name to name.

* Rename "Unit" to "Units"

* Improve make_metadata_table docstring.

* Start addressing inconsistencies between rendered and hardcoded tables.

* Fix typos in PET metadata

From #786.

* Add metadata fields from qMRI appendix.

* Fix.

* Address duplicate datatypes.

Should address the "string or string or string or string" issue.

* Wrap example strings in code.

* Use enum for n/a instead of pattern.

It's easier to identify as a special case.

* Replace "string" with "n/a" when appropriate.

* Address some inconsistencies.

* Take a crack as SpatialReference.

The type is complicated, but I _think_ I've got it figured out.

* Apply suggestions from code review

Thanks @effigies!

Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>

* Update tools/schemacode/schema.py

* search_structure --> dereference_yaml

* Use faster loading approach.

* Fix deprecation link.

* Apply suggestions from code review

Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>

* Update 01-magnetic-resonance-imaging-data.md

* Add B0FieldIdentifier and B0FieldSource.

* Revert type changes and add TODOs to check them.

* Update tools/schemacode/schema.py

* Replace remaining relative links.

* Apply suggestions from code review

Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>

* Add new coordinate systems from #775.

* Grab hack from #781 (which wasn't merged).

* Create HED.yaml

* boldify table headers

* add device info metadata

* fix table fences

* fix cell padding

* fix cell padding

this is getting old VERY quickly

* Fix up DICOM tags in metadata.

* Leverage "name" field for section-specific metadata definitions.

* composite instances --> measurements

Changes from #813.

* Fix name of HED field.

* Fix string formatting in coordinate system fields.

* Move "preferably same as" to section-specific text.

For #774 (comment)

* Standardize DICOM Tag format.

Still need to move the references out of the generic definitions.

* Move mentions of DICOM Tags out of definitions.

Only for fields that *also* appear in modalities that *don't* use DICOM.

* Apply suggestions from code review

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Generalize SoftwareFilters example.

* Distinguish AnatomicalLandmarkCoordinates definitions.

* Rename fmapEchoTime to match new format.

* Apply suggestions from code review

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Apply suggestions from code review

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Address review.

* Fix example manufacturer names.

* TEMPORARY: fix osipi URL (revert when osipi.org is back)

* Fix regex for identifying macros.

* Partially address review.

* Do not assume minItems is 1 for array terms.

* composite instances --> measurements (again)

* Update src/schema/metadata/MagneticFieldStrength.yaml

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Update description to PharmaceuticalDoseTime

#774 (comment)

* Update 09-positron-emission-tomography.md

* fix table pipe alignment

* add pharmaceuticaldosetime fix to schema

includes a bugfix to convert "should" (not casing, this was not intended
as a SHOULD) to a MUST.

* fix str examples

* Apply suggestions from code review

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Update src/schema/metadata/RepetitionTimeExcitation.yaml

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Add DoseCalibrationFactor.

* Update ScanDate definition and deprecate it.

* Remove hardcoded tables.

* Remove unused links.

* Update tools/schemacode/schema.py

Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>

Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>
Co-authored-by: Remi Gau <remi_gau@hotmail.com>
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Co-authored-by: mnoergaard <martin.noergaard@nru.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
Development

Successfully merging this pull request may close these issues.

5 participants