diff --git a/src/schema/README.md b/src/schema/README.md index 68be4f2a9b..6d2225bf69 100644 --- a/src/schema/README.md +++ b/src/schema/README.md @@ -259,20 +259,36 @@ The following operators should be defined by an interpreter: The following functions should be defined by an interpreter: -| Function | Definition | Example | Note | -| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------ | -| `count(arg: array, val: any)` | Number of elements in an array equal to `val` | `count(columns.type, "EEG")` | The number of times "EEG" appears in the column "type" of the current TSV file | -| `exists(arg: str \| array, rule: str) -> int` | Count of files in an array that exist in the dataset. String is array with length 1. Rules include `"bids-uri"`, `"dataset"`, `"subject"` and `"stimuli"`. | `exists(sidecar.IntendedFor, "subject")` | True if all files in `IntendedFor` exist, relative to the subject directory. | -| `index(arg: array, val: any)` | Index of first element in an array equal to `val`, `null` if not found | `index(["i", "j", "k"], axis)` | The number, from 0-2 corresponding to the string `axis` | -| `intersects(a: array, b: array) -> bool` | `true` if arguments contain any shared elements | `intersects(dataset.modalities, ["pet", "mri"])` | True if either PET or MRI data is found in dataset | -| `allequal(a: array, b: array) -> bool` | `true` if arrays have the same length and paired elements are equal | `intersects(dataset.modalities, ["pet", "mri"])` | True if either PET or MRI data is found in dataset | -| `length(arg: array) -> int` | Number of elements in an array | `length(columns.onset) > 0` | True if there is at least one value in the onset column | -| `match(arg: str, pattern: str) -> bool` | `true` if `arg` matches the regular expression `pattern` (anywhere in string) | `match(extension, ".gz$")` | True if the file extension ends with `.gz` | -| `max(arg: array) -> number` | The largest non-`n/a` value in an array | `max(columns.onset)` | The time of the last onset in an events.tsv file | -| `min(arg: array) -> number` | The smallest non-`n/a` value in an array | `min(sidecar.SliceTiming) == 0` | A check that the onset of the first slice is 0s | -| `sorted(arg: array, method: str) -> array` | The sorted values of the input array; defaults to type-determined sort. If method is "lexical", or "numeric" use lexical or numeric sort. | `sorted(sidecar.VolumeTiming) == sidecar.VolumeTiming` | True if `sidecar.VolumeTiming` is sorted | -| `substr(arg: str, start: int, end: int) -> str` | The portion of the input string spanning from start position to end position | `substr(path, 0, length(path) - 3)` | `path` with the last three characters dropped | -| `type(arg: Any) -> str` | The name of the type, including `"array"`, `"object"`, `"null"` | `type(datatypes)` | Returns `"array"` | +| Function | Definition | Example | Note | +| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------ | +| `count(arg: array, val: any) -> int` | Number of elements in an array equal to `val` | `count(columns.type, "EEG")` | The number of times "EEG" appears in the column "type" of the current TSV file | +| `exists(arg: str \| array, rule: str) -> int` | Count of files in an array that exist in the dataset. String is array with length 1. See following section for the meanings of rules. | `exists(sidecar.IntendedFor, "subject")` | True if all files in `IntendedFor` exist, relative to the subject directory. | +| `index(arg: array, val: any) -> int` | Index of first element in an array equal to `val`, `null` if not found | `index(["i", "j", "k"], axis)` | The number, from 0-2 corresponding to the string `axis` | +| `intersects(a: array, b: array) -> bool` | `true` if arguments contain any shared elements | `intersects(dataset.modalities, ["pet", "mri"])` | True if either PET or MRI data is found in dataset | +| `allequal(a: array, b: array) -> bool` | `true` if arrays have the same length and paired elements are equal | `intersects(dataset.modalities, ["pet", "mri"])` | True if either PET or MRI data is found in dataset | +| `length(arg: array) -> int` | Number of elements in an array | `length(columns.onset) > 0` | True if there is at least one value in the onset column | +| `match(arg: str, pattern: str) -> bool` | `true` if `arg` matches the regular expression `pattern` (anywhere in string) | `match(extension, ".gz$")` | True if the file extension ends with `.gz` | +| `max(arg: array) -> number` | The largest non-`n/a` value in an array | `max(columns.onset)` | The time of the last onset in an events.tsv file | +| `min(arg: array) -> number` | The smallest non-`n/a` value in an array | `min(sidecar.SliceTiming) == 0` | A check that the onset of the first slice is 0s | +| `sorted(arg: array, method: str) -> array` | The sorted values of the input array; defaults to type-determined sort. If method is "lexical", or "numeric" use lexical or numeric sort. | `sorted(sidecar.VolumeTiming) == sidecar.VolumeTiming` | True if `sidecar.VolumeTiming` is sorted | +| `substr(arg: str, start: int, end: int) -> str` | The portion of the input string spanning from start position to end position | `substr(path, 0, length(path) - 3)` | `path` with the last three characters dropped | +| `type(arg: Any) -> str` | The name of the type, including `"array"`, `"object"`, `"null"` | `type(datatypes)` | Returns `"array"` | + +#### The `exists()` function + +In various places, BIDS datasets may declare links between files. +In order to validate these links, +the `exists()` function returns a count of files that can be found within the dataset. +To accommodate the various ways of declaring these links, +the following rules are defined: + +| `rule` | Definition | Example | +| ------------ | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | +| `"dataset"` | A path relative to the root of the dataset. | `exists('participants.tsv', 'dataset')` | +| `"subject"` | A path relative to the current subject directory. | `exists('ses-1/anat/sub-01_ses-1_T1w.nii.gz', 'subject')` | +| `"stimuli"` | A path relative to the `/stimuli` directory. | For `events.tsv`: `exists(columns.stim_file, 'stimuli') == length(columns.stim_file)` | +| `"file"` | A path relative to the directory containing the current file. | For `scans.tsv`: `exists(columns.filename, 'file') == length(columns.stim_file)` | +| `"bids-uri"` | A URI of the form `bids::`. If `` is empty, the current dataset is used. | `exists('bids::participants.tsv', 'bids-uri')` | #### The special value `null` diff --git a/src/schema/rules/checks/dataset.yaml b/src/schema/rules/checks/dataset.yaml index 91704d32e4..031685d1fa 100644 --- a/src/schema/rules/checks/dataset.yaml +++ b/src/schema/rules/checks/dataset.yaml @@ -91,3 +91,15 @@ SingleSourceCitationFields: - '!("HowToAcknowledge" in dataset.dataset_description)' - '!("License" in dataset.dataset_description)' - '!("ReferencesAndLinks" in dataset.dataset_description)' + +ScansTSVScans: + issue: + code: SCANS_FILENAME_NOT_MATCH_DATASET + level: error + message: | + Filenames in scans.tsv file do not match what is present in the BIDS dataset. + selectors: + - suffix == 'scans' + - extension == '.tsv' + checks: + - exists(columns.filename, "file") == length(columns.filename)