Skip to content

Commit

Permalink
Merge pull request #1050 from dandi/bids_update
Browse files Browse the repository at this point in the history
Updating BIDS validator and schema to contemporary upstream equivalent
  • Loading branch information
yarikoptic committed Jul 28, 2022
2 parents 1c94736 + 468e39c commit 3655aa1
Show file tree
Hide file tree
Showing 70 changed files with 5,318 additions and 601 deletions.
2 changes: 1 addition & 1 deletion dandi/bids_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def is_valid(
Parameters
----------
validation_result: dict
Dictionary as returned by `dandi.bids_validator_xs.validate_bids()`.
Dictionary as returned by `dandi.support.bids.validator.validate_bids()`.
allow_missing_files: bool, optional
Whether to consider the dataset invalid if any mandatory files are not present.
allow_invalid_filenames: bool, optional
Expand Down
31 changes: 21 additions & 10 deletions dandi/cli/cmd_validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,33 +7,44 @@


@click.command()
@devel_option(
@click.option(
"--schema", help="Validate against new BIDS schema version", metavar="VERSION"
)
@click.option("--report", help="Specify path to write a report under.")
@click.option(
"--report-flag",
"--report-path",
help="Write report under path, this option implies `--report/-r`.",
)
@click.option(
"--report",
"-r",
is_flag=True,
help="Whether to write a report under a unique path in the current directory. "
"Only usable if `--report` is not already used.",
help="Whether to write a report under a unique path in the DANDI log directory.",
)
@click.argument("paths", nargs=-1, type=click.Path(exists=True, dir_okay=True))
@devel_debug_option()
@map_to_click_exceptions
def validate_bids(
paths, schema=None, devel_debug=False, report=False, report_flag=False
paths,
schema,
report,
report_path,
devel_debug=False,
):
"""Validate BIDS paths."""
"""Validate BIDS paths.
Notes
-----
Used from bash, eg:
dandi validate-bids /my/path
"""

from ..bids_utils import is_valid, report_errors
from ..validate import validate_bids as validate_bids_

if report_flag and not report:
report = report_flag

validator_result = validate_bids_(
*paths,
report=report,
report_path=report_path,
schema_version=schema,
devel_debug=devel_debug,
)
Expand Down
2 changes: 1 addition & 1 deletion dandi/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def get_metadata(path: Union[str, Path]) -> Optional[dict]:
# could still be augmented with `_is_nwb` to disambiguate both cases
# at the detection level.
if _path_in_bids(path):
from .bids_validator_xs import validate_bids
from .validate import validate_bids

_meta = validate_bids(path)
meta = _meta["match_listing"][0]
Expand Down
11 changes: 0 additions & 11 deletions dandi/support/bids/schemadata/1.7.0+012/rules/associated_data.yaml

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# BIDS-schema

Portions of the BIDS specification are defined using YAML files, in order to
Portions of the BIDS specification are defined using YAML files in order to
make the specification machine-readable.

Currently, the portions of the specification that rely on this schema are
Currently the portions of the specification that rely on this schema are
the entity tables, entity definitions, filename templates, and metadata tables.
Any changes to the specification should be mirrored in the schema.

Expand Down Expand Up @@ -32,14 +32,14 @@ The types of objects currently supported in the schema are:
- suffixes,
- metadata,
- top-level files,
- and non-BIDS associated folders.
- and non-BIDS associated directories.

Each of these object types has a single file in the `objects/` folder.
Each of these object types has a single file in the `objects/` directory.

- `modalities.yaml`: The modalities, or types of technology, used to acquire data in a BIDS dataset.
These modalities are not reflected directly in the specification.
For example, while both fMRI and DWI data are acquired with an MRI,
in a BIDS dataset they are stored in different folders reflecting the two different `datatypes`.
in a BIDS dataset they are stored in different directories reflecting the two different `datatypes`.

- `datatypes.yaml`: Data types supported by the specification.
The only information provided in the file is:
Expand All @@ -48,7 +48,7 @@ Each of these object types has a single file in the `objects/` folder.
1. each datatype's full name
1. a free text description of the datatype.

- `entities.yaml`: Entities (key/value pairs in folder and filenames).
- `entities.yaml`: Entities (key-value pairs in directory and filenames).

- `metadata.yaml`: All valid metadata fields that are explicitly supported in BIDS sidecar JSON files.

Expand All @@ -58,7 +58,7 @@ Each of these object types has a single file in the `objects/` folder.

- `top_level_files.yaml`: Valid top-level files which may appear in a BIDS dataset.

- `associated_data.yaml`: Folders that may appear within a dataset folder without following BIDS rules.
- `associated_data.yaml`: Directories that may appear within a dataset directory without following BIDS rules.

### On re-used objects with different definitions

Expand All @@ -73,7 +73,7 @@ For objects with `snake_case` names, two underscores must be used.
There should also be a comment near the object definition in the YAML file describing the nature of the different objects.

For example, the TSV column `"reference"` means different things when used for EEG data, as compared to iEEG data.
As such, there are two definitions in `columns.yaml` for the `"reference"` column: `"reference__eeg"` and `"reference_ieeg"`.
As such, there are two definitions in `columns.yaml` for the `"reference"` column: `"reference__eeg"` and `"reference__ieeg"`.

```yaml
# reference column for channels.tsv files for EEG data
Expand Down Expand Up @@ -115,15 +115,15 @@ The `description` field is a freeform description of the modality.
### `datatypes.yaml`

This file contains a dictionary in which each datatype is defined.
Keys are the folder names associated with each datatype (for example, `anat` for anatomical MRI),
Keys are the directory names associated with each datatype (for example, `anat` for anatomical MRI),
and each associated value is a dictionary with two keys: `name` and `description`.

The `name` field is the full name of the datatype.
The `description` field is a freeform description of the datatype.

### `entities.yaml`

This file contains a dictionary in which each entity (key/value pair in filenames) is defined.
This file contains a dictionary in which each entity (key-value pair in filenames) is defined.
Keys are long-form versions of the entities, which are distinct from both the entities as
they appear in filenames _and_ their full names.
For example, the key for the "Contrast Enhancing Agent" entity, which appears in filenames as `ce-<label>`,
Expand Down Expand Up @@ -155,11 +155,11 @@ The `format` field defines the specific format the value should take.
Entities are broadly divided into either `label` or `index` types.

When `format` is `index`, then the entity's associated value should be a non-zero integer, optionally with leading zeros.
For example, `run` should have an index, so a valid key-value pair in a filename would be `run-01`.
For example, `run` should have an index, so a valid entity would be `run-01`.

When `format` is `label`, then the value should be an alphanumeric string.
Beyond limitations on which characters are allowed, labels have few restrictions.
For example, `acq` should have a label, so a valid key-value pair might be `acq-someLabel`.
For example, `acq` should have a label, so a valid entity might be `acq-someLabel`.

For a small number of entities, only certain labels are allowed.
In those cases, instead of a `format` field, there will be an `enum` field, which will provide a list of allowed values.
Expand Down Expand Up @@ -218,7 +218,7 @@ There are additional fields which may define rules that apply to a given type.

- `dataset_relative` (relative paths from dataset root),

- `participant_relative` (relative paths from participant folder).
- `participant_relative` (relative paths from participant directory).

- `enum` defines a list of valid values for the field.
The minimum string length (`minLength`) defaults to 1.
Expand Down Expand Up @@ -269,7 +269,7 @@ There are additional fields which may define rules that apply to a given type.
- `object`: If `type` is `object`, then there MAY be any of the following
fields at the same level as `type`: `additionalProperties`,
`properties`.
Objects are defined as sets of key/value pairs.
Objects are defined as sets of key-value pairs.
Keys MUST be strings, while values may have specific attributes,
which is what `additionalProperties` describes.
Here is an example of a field which MUST be an object,
Expand Down Expand Up @@ -388,29 +388,29 @@ The `description` field is a freeform description of the file.

### `associated_data.yaml`

This file contains a dictionary in which each non-BIDS folder is defined.
Keys are folder names, and each associated value is a dictionary with two keys: `name` and `description`.
This file contains a dictionary in which each non-BIDS directory is defined.
Keys are directory names, and each associated value is a dictionary with two keys: `name` and `description`.

The `name` field is the full name of the folder.
The `description` field is a freeform description of the folder.
The `name` field is the full name of the directory.
The `description` field is a freeform description of the directory.

## Rule files

The files in the `rules/` folder are less standardized than the files in `objects/`,
The files in the `rules/` directory are less standardized than the files in `objects/`,
because rules governing how different object types interact in a valid dataset are more variable
than the object definitions.

- `modalities.yaml`: This file simply groups `datatypes` under their associated modality.

- `datatypes/*.yaml`: Files in the `datatypes` folder contain information about valid filenames within a given datatype.
- `datatypes/*.yaml`: Files in the `datatypes` directory contain information about valid filenames within a given datatype.
Specifically, each datatype's YAML file contains a list of dictionaries.
Each dictionary contains a list of suffixes, entities, and file extensions which may constitute a valid BIDS filename.

- `entities.yaml`: This file simply defines the order in which entities, when present, MUST appear in filenames.

- `top_level_files.yaml`: Requirement levels and valid file extensions of top-level files.

- `associated_data.yaml`: Requirement levels of associated non-BIDS folders.
- `associated_data.yaml`: Requirement levels of associated non-BIDS directories.

### `modalities.yaml`

Expand All @@ -419,7 +419,7 @@ The `datatypes` dictionary contains a list of datatypes that fall under that mod

### `datatypes/*.yaml`

The files in this folder are currently the least standardized of any part of the schema.
The files in this directory are currently the least standardized of any part of the schema.

Each file corresponds to a single `datatype`.
Within the file is a list of dictionaries.
Expand Down Expand Up @@ -496,5 +496,24 @@ In cases where there is a data file and a metadata file, the `.json` extension f

### `associated_data.yaml`

This file contains a dictionary in which each key is a folder and the value is a dictionary with one key: `required`.
The `required` entry contains a boolean value to indicate if that folder is required for BIDS datasets or not.
This file contains a dictionary in which each key is a directory and the value is a dictionary with one key: `required`.
The `required` entry contains a boolean value to indicate if that directory is required for BIDS datasets or not.

## Version of the schema

File `SCHEMA_VERSION` in the top of the directory contains a semantic
version (`MAJOR.MINOR.PATCH`) for the schema (how it is organized).
Note that while in `0.` series, breaking changes are
permitted without changing the `MAJOR` (leading) component of the version.
Going forward, the 2nd, `MINOR` indicator should be
incremented whenever schema organization introduces "breaking changes":
changes which would cause existing tools reading schema to
adjust their code to be able to read it again.
Additions of new components to the schema should increment the last,
`PATCH`, component of the version so that tools could selectively
enable/disable loading specific components of the schema.
With the release of `1.0.0` version of the schema,
we expect that the `MAJOR` component
will be incremented whenever schema organization introduces "breaking changes",
`MINOR` - when adding new components to the schema,
and `PATCH` - when fixing errors in existing components.
1 change: 1 addition & 0 deletions dandi/support/bids/schemadata/1.7.0+369/SCHEMA_VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.3.0
Loading

0 comments on commit 3655aa1

Please sign in to comment.