From 656b10f7823b14a1ecb23700da6f132f66838384 Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Thu, 19 May 2022 13:21:00 -0400 Subject: [PATCH 01/10] NGFF format support Co-authored-by: Satrajit Ghosh --- src/04-modality-specific-files/10-microscopy.md | 8 +++----- src/schema/objects/extensions.yaml | 7 +++++++ src/schema/rules/datatypes/micr.yaml | 1 + 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/src/04-modality-specific-files/10-microscopy.md b/src/04-modality-specific-files/10-microscopy.md index 5515559fdc..5ce0239f1e 100644 --- a/src/04-modality-specific-files/10-microscopy.md +++ b/src/04-modality-specific-files/10-microscopy.md @@ -54,12 +54,10 @@ Microscopy raw data MUST be stored in one of the following formats: (`.ome.tif` for standard TIFF files or `.ome.btf` for [BigTIFF](https://www.awaresystems.be/imaging/tiff/bigtiff.html) files) -If different from PNG, TIFF or OME-TIFF, the original unprocessed data in the native format MAY be -stored in the [`/sourcedata` directory](../02-common-principles.md#source-vs-raw-vs-derived-data). +- [NGFF/OME-ZARR](https://ngff.openmicroscopy.org/latest/) (`.ngff` - Note that these are directories.) -Future versions may extend this list of supported file formats, for example with the -Next-Generation File Formats currently developed by OME ([OME-NGFF](https://ngff.openmicroscopy.org/latest/)) -as a successor to OME-TIFF for better remote sharing of large datasets. +If different from PNG, TIFF, OME-TIFF, or NGFF the original unprocessed data in the native format MAY be +stored in the [`/sourcedata` directory](../02-common-principles.md#source-vs-raw-vs-derived-data). ### Modality suffixes Microscopy data currently support the following imaging modalities: diff --git a/src/schema/objects/extensions.yaml b/src/schema/objects/extensions.yaml index f3d4d0dcf8..0c1a2e7d97 100644 --- a/src/schema/objects/extensions.yaml +++ b/src/schema/objects/extensions.yaml @@ -140,6 +140,13 @@ Used by KIT, Yokogawa, and Ricoh MEG systems. Successor to the `.sqd` extension for marker files. +.ngff/: + name: OME Next Generation File Format + description: | + An OME-NGFF file. + + OME-NGFF is a [Zarr](https://zarr.readthedocs.io)-based format, organizing data arrays in nested directories. + This format was developed by the Open Microscopy Environment to provide data stream access to very large data. .nii: name: NIfTI description: | diff --git a/src/schema/rules/datatypes/micr.yaml b/src/schema/rules/datatypes/micr.yaml index b5caf97b21..1b254a4f90 100644 --- a/src/schema/rules/datatypes/micr.yaml +++ b/src/schema/rules/datatypes/micr.yaml @@ -21,6 +21,7 @@ microscopy: extensions: - .ome.tif - .ome.btf + - .ngff/ - .png - .tif - .json From e22c916f8c77b36d6015f74e2ec6f0995cbe56c4 Mon Sep 17 00:00:00 2001 From: Satrajit Ghosh Date: Wed, 25 May 2022 17:23:40 -0400 Subject: [PATCH 02/10] Update src/schema/rules/datatypes/micr.yaml Co-authored-by: Yaroslav Halchenko --- src/schema/rules/datatypes/micr.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/schema/rules/datatypes/micr.yaml b/src/schema/rules/datatypes/micr.yaml index 1b254a4f90..c08072a740 100644 --- a/src/schema/rules/datatypes/micr.yaml +++ b/src/schema/rules/datatypes/micr.yaml @@ -21,7 +21,7 @@ microscopy: extensions: - .ome.tif - .ome.btf - - .ngff/ + - .ome.zarr/ - .png - .tif - .json From 266effbdb5c7f5c35e4352743ebcab1bee376d34 Mon Sep 17 00:00:00 2001 From: Satrajit Ghosh Date: Wed, 25 May 2022 17:23:55 -0400 Subject: [PATCH 03/10] Update src/schema/objects/extensions.yaml Co-authored-by: Yaroslav Halchenko --- src/schema/objects/extensions.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/schema/objects/extensions.yaml b/src/schema/objects/extensions.yaml index 0c1a2e7d97..21aaa515ee 100644 --- a/src/schema/objects/extensions.yaml +++ b/src/schema/objects/extensions.yaml @@ -140,7 +140,7 @@ Used by KIT, Yokogawa, and Ricoh MEG systems. Successor to the `.sqd` extension for marker files. -.ngff/: +.ome.zarr/: name: OME Next Generation File Format description: | An OME-NGFF file. From a9403fc02c5502d5310c432bd2df423d664feb26 Mon Sep 17 00:00:00 2001 From: Satrajit Ghosh Date: Wed, 25 May 2022 17:24:09 -0400 Subject: [PATCH 04/10] Update src/04-modality-specific-files/10-microscopy.md Co-authored-by: Yaroslav Halchenko --- src/04-modality-specific-files/10-microscopy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/04-modality-specific-files/10-microscopy.md b/src/04-modality-specific-files/10-microscopy.md index 5ce0239f1e..82f94e02e6 100644 --- a/src/04-modality-specific-files/10-microscopy.md +++ b/src/04-modality-specific-files/10-microscopy.md @@ -54,7 +54,7 @@ Microscopy raw data MUST be stored in one of the following formats: (`.ome.tif` for standard TIFF files or `.ome.btf` for [BigTIFF](https://www.awaresystems.be/imaging/tiff/bigtiff.html) files) -- [NGFF/OME-ZARR](https://ngff.openmicroscopy.org/latest/) (`.ngff` - Note that these are directories.) +- [NGFF/OME-ZARR](https://ngff.openmicroscopy.org/latest/) (`.ome.zarr` directories) If different from PNG, TIFF, OME-TIFF, or NGFF the original unprocessed data in the native format MAY be stored in the [`/sourcedata` directory](../02-common-principles.md#source-vs-raw-vs-derived-data). From 64d7b27f76ee3c14ece390fc8a33f59807b0b328 Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Sun, 29 May 2022 23:01:16 -0400 Subject: [PATCH 05/10] typo --- src/04-modality-specific-files/10-microscopy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/04-modality-specific-files/10-microscopy.md b/src/04-modality-specific-files/10-microscopy.md index 82f94e02e6..c1cf0c6262 100644 --- a/src/04-modality-specific-files/10-microscopy.md +++ b/src/04-modality-specific-files/10-microscopy.md @@ -56,7 +56,7 @@ Microscopy raw data MUST be stored in one of the following formats: - [NGFF/OME-ZARR](https://ngff.openmicroscopy.org/latest/) (`.ome.zarr` directories) -If different from PNG, TIFF, OME-TIFF, or NGFF the original unprocessed data in the native format MAY be +If different from PNG, TIFF, OME-TIFF, or NGFF, the original unprocessed data in the native format MAY be stored in the [`/sourcedata` directory](../02-common-principles.md#source-vs-raw-vs-derived-data). ### Modality suffixes From 4fb2574bf5502e697f46018058f49996ffd4a9fd Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Thu, 9 Jun 2022 05:02:38 -0400 Subject: [PATCH 06/10] Validator support for dynamic pseudofile suffix query. --- .../schemacode/tests/test_validator.py | 2 +- tools/schemacode/schemacode/validator.py | 59 ++++++++++++++----- 2 files changed, 46 insertions(+), 15 deletions(-) diff --git a/tools/schemacode/schemacode/tests/test_validator.py b/tools/schemacode/schemacode/tests/test_validator.py index ac9273dcf2..e16e5d5858 100644 --- a/tools/schemacode/schemacode/tests/test_validator.py +++ b/tools/schemacode/schemacode/tests/test_validator.py @@ -229,7 +229,7 @@ def test_load_all(): os.path.abspath(os.path.dirname(__file__)), "../data/schema", ) - schema_all = load_all(schema_path) + schema_all, _ = load_all(schema_path) # Check if expected keys are present in all entries for entry in schema_all: diff --git a/tools/schemacode/schemacode/validator.py b/tools/schemacode/schemacode/validator.py index 59c19f90d9..4793461bd2 100644 --- a/tools/schemacode/schemacode/validator.py +++ b/tools/schemacode/schemacode/validator.py @@ -16,7 +16,9 @@ DIR_ENTITIES = ["subject", "session"] -def _get_paths(bids_paths): +def _get_paths(bids_paths, + pseudofile_suffixes = [], + ): """ Get all paths from a list of directories, excluding hidden subdirectories from distribution. @@ -25,6 +27,8 @@ def _get_paths(bids_paths): bids_paths : list or str Directories from which to get paths, may also contain file paths, which will remain unchanged. + pseudofile_suffixes : list of str + Directory suffixes prompting the validation of the directory name and limiting further directory walk. Notes ----- @@ -47,9 +51,6 @@ def _get_paths(bids_paths): ".bidsignore", "dandiset.yaml", ] - # Inelegant hard-coded solution. - # Could be replaced by a maximum depth limit if BIDS root auto-detection is implemented. - treat_as_file_suffix = [".ngff"] path_list = [] for bids_path in bids_paths: @@ -57,13 +58,10 @@ def _get_paths(bids_paths): if os.path.isfile(bids_path): path_list.append(bids_path) continue - for root, dirs, file_names in os.walk(bids_path, topdown=False): - if any(root.endswith(i) for i in treat_as_file_suffix): - continue - if any(f"{i}/" in root for i in treat_as_file_suffix): - continue - if any(f"{i}\\" in root for i in treat_as_file_suffix): - continue + for root, dirs, file_names in os.walk(bids_path, topdown=True): + if any(root.endswith(i) for i in pseudofile_suffixes): + path_list.append(f"{root}/") + dirs[:] = [] # will break if BIDS ever puts meaningful data under `/.{dandi,datalad,git}*/` if any(exclude_subdir in root for exclude_subdir in exclude_subdirs): continue @@ -335,6 +333,8 @@ def load_all( ------- all_regex : list of dict A list of dictionaries, with keys including 'regex' and 'mandatory'. + my_schema : list of dict + Nested dictionaries representing the full schema. """ my_schema = schema.load_schema(schema_dir) @@ -346,13 +346,14 @@ def load_all( ) all_regex.extend(top_level_regex) - return all_regex + return all_regex, my_schema def validate_all( bids_paths, regex_schema, debug=False, + pseudofile_suffixes=[], ): """ Validate `bids_paths` based on a `regex_schema` dictionary list, including regexes. @@ -366,6 +367,8 @@ def validate_all( debug : tuple, optional Whether to print itemwise notices for checks on the console, and include them in the validation result. + pseudofile_suffixes : list of str + Directory suffixes prompting the validation of the directory name and limiting further directory walk. Returns ------- @@ -384,7 +387,7 @@ def validate_all( """ tracking_schema = deepcopy(regex_schema) - paths_list = _get_paths(bids_paths) + paths_list = _get_paths(bids_paths, pseudofile_suffixes=pseudofile_suffixes) tracking_paths = deepcopy(paths_list) if debug: itemwise_results = [] @@ -657,6 +660,32 @@ def log_errors(validation_result): for i in validation_result["path_tracking"]: lgr.warning("The `%s` file was not matched by any regex schema entry.", i) +def _query_pseudofile_suffixes(my_schema): + """Query schema for suffixes which identify directory entities. + + Paramaters + ---------- + my_schema : dict + Nested direcotry as produced by `schemacode.schema.load_schema()`. + + Returns + ------- + list of str + Directory pseudofile suffixes excluding trailing slashes. + + Notes + ----- + * Yes this seems super-awkward to do explicitly, after all, the trailing slash is + already in so it should automagically work, but no: + - Subdirectory names need to be dynamically excluded from validation input. + - Backslash directory delimiters are still in use, which is regrettable. + """ + pseudofile_suffixes = [] + for i in my_schema["objects"]["extensions"]: + if i.endswith("/"): + if i != "/": + pseudofile_suffixes.append(i[:-1]) + return pseudofile_suffixes def validate_bids( bids_paths, @@ -716,11 +745,13 @@ def validate_bids( bids_paths = [bids_paths] bids_schema_dir = select_schema_dir(bids_paths, schema_reference_root, schema_version) - regex_schema = load_all(bids_schema_dir) + regex_schema, my_schema = load_all(bids_schema_dir) + pseudofile_suffixes = _query_pseudofile_suffixes(my_schema) validation_result = validate_all( bids_paths, regex_schema, debug=debug, + pseudofile_suffixes=pseudofile_suffixes, ) log_errors(validation_result) From 9252f3e063b69283218321198d6e47dbaba21bbb Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Thu, 9 Jun 2022 05:14:55 -0400 Subject: [PATCH 07/10] typos --- tools/schemacode/schemacode/validator.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/schemacode/schemacode/validator.py b/tools/schemacode/schemacode/validator.py index 4793461bd2..a70878fa79 100644 --- a/tools/schemacode/schemacode/validator.py +++ b/tools/schemacode/schemacode/validator.py @@ -663,10 +663,10 @@ def log_errors(validation_result): def _query_pseudofile_suffixes(my_schema): """Query schema for suffixes which identify directory entities. - Paramaters + Parameters ---------- my_schema : dict - Nested direcotry as produced by `schemacode.schema.load_schema()`. + Nested directory as produced by `schemacode.schema.load_schema()`. Returns ------- From 12d7deffef2975b7cb76c079cd9c5a4f664d90b0 Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Thu, 9 Jun 2022 05:17:38 -0400 Subject: [PATCH 08/10] Black and flake support --- tools/schemacode/schemacode/validator.py | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/tools/schemacode/schemacode/validator.py b/tools/schemacode/schemacode/validator.py index a70878fa79..50f58405cf 100644 --- a/tools/schemacode/schemacode/validator.py +++ b/tools/schemacode/schemacode/validator.py @@ -16,9 +16,10 @@ DIR_ENTITIES = ["subject", "session"] -def _get_paths(bids_paths, - pseudofile_suffixes = [], - ): +def _get_paths( + bids_paths, + pseudofile_suffixes=[], +): """ Get all paths from a list of directories, excluding hidden subdirectories from distribution. @@ -28,7 +29,8 @@ def _get_paths(bids_paths, Directories from which to get paths, may also contain file paths, which will remain unchanged. pseudofile_suffixes : list of str - Directory suffixes prompting the validation of the directory name and limiting further directory walk. + Directory suffixes prompting the validation of the directory name and limiting further + directory walk. Notes ----- @@ -368,7 +370,8 @@ def validate_all( Whether to print itemwise notices for checks on the console, and include them in the validation result. pseudofile_suffixes : list of str - Directory suffixes prompting the validation of the directory name and limiting further directory walk. + Directory suffixes prompting the validation of the directory name and limiting further + directory walk. Returns ------- @@ -660,6 +663,7 @@ def log_errors(validation_result): for i in validation_result["path_tracking"]: lgr.warning("The `%s` file was not matched by any regex schema entry.", i) + def _query_pseudofile_suffixes(my_schema): """Query schema for suffixes which identify directory entities. @@ -687,6 +691,7 @@ def _query_pseudofile_suffixes(my_schema): pseudofile_suffixes.append(i[:-1]) return pseudofile_suffixes + def validate_bids( bids_paths, schema_reference_root="/usr/share/bids-schema/", From 4c5521b8ce284ded97035ed65b4058edc7e2ae86 Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Fri, 10 Jun 2022 11:11:44 -0400 Subject: [PATCH 09/10] More comments and clarifications, suffix query function name change --- tools/schemacode/schemacode/validator.py | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/tools/schemacode/schemacode/validator.py b/tools/schemacode/schemacode/validator.py index 50f58405cf..e98ee1a29c 100644 --- a/tools/schemacode/schemacode/validator.py +++ b/tools/schemacode/schemacode/validator.py @@ -62,7 +62,9 @@ def _get_paths( continue for root, dirs, file_names in os.walk(bids_path, topdown=True): if any(root.endswith(i) for i in pseudofile_suffixes): + # Add the directory name to the validation paths list. path_list.append(f"{root}/") + # Do not index the contents of the directory. dirs[:] = [] # will break if BIDS ever puts meaningful data under `/.{dandi,datalad,git}*/` if any(exclude_subdir in root for exclude_subdir in exclude_subdirs): @@ -369,9 +371,11 @@ def validate_all( debug : tuple, optional Whether to print itemwise notices for checks on the console, and include them in the validation result. - pseudofile_suffixes : list of str - Directory suffixes prompting the validation of the directory name and limiting further - directory walk. + pseudofile_suffixes : list of str, optional + Any suffixes which identify BIDS-valid directory data. + These pseudo-file suffixes will be validated based on the directory name, with the + directory contents not being indexed for validation. + By default, no pseudo-file suffixes are checked. Returns ------- @@ -664,7 +668,7 @@ def log_errors(validation_result): lgr.warning("The `%s` file was not matched by any regex schema entry.", i) -def _query_pseudofile_suffixes(my_schema): +def _get_directory_suffixes(my_schema): """Query schema for suffixes which identify directory entities. Parameters @@ -751,7 +755,7 @@ def validate_bids( bids_schema_dir = select_schema_dir(bids_paths, schema_reference_root, schema_version) regex_schema, my_schema = load_all(bids_schema_dir) - pseudofile_suffixes = _query_pseudofile_suffixes(my_schema) + pseudofile_suffixes = _get_directory_suffixes(my_schema) validation_result = validate_all( bids_paths, regex_schema, From 988bd2d44015599165448103ed275acdb075467b Mon Sep 17 00:00:00 2001 From: Horea Christian Date: Thu, 30 Jun 2022 15:28:49 -0400 Subject: [PATCH 10/10] More specificity describing OME-ZARR --- src/04-modality-specific-files/10-microscopy.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/04-modality-specific-files/10-microscopy.md b/src/04-modality-specific-files/10-microscopy.md index c1cf0c6262..015d44bb0c 100644 --- a/src/04-modality-specific-files/10-microscopy.md +++ b/src/04-modality-specific-files/10-microscopy.md @@ -37,7 +37,8 @@ by the [Open Microscopy Environment](https://www.openmicroscopy.org/) for whole- the [OME-TIFF file specifications](https://docs.openmicroscopy.org/ome-model/6.1.2/ome-tiff/file-structure.html). The OME-TIFF file allows for multi-page TIFF files to store multiple image planes and supports multi-resolution pyramidal tiled images. An OME-XML data block is also embedded inside the -file’s header. +file’s header. Further, OME-ZARR (sometimes referred to as OME-NGFF or NGFF) has been developed to provide improved +access and storage for large data via chunked and compressed N-dimensional arrays. The BIDS standard accepts microscopy data in a number of file formats to accommodate datasets stored in 2D image formats and whole-slide imaging formats, to accommodate lossless and lossy @@ -54,9 +55,9 @@ Microscopy raw data MUST be stored in one of the following formats: (`.ome.tif` for standard TIFF files or `.ome.btf` for [BigTIFF](https://www.awaresystems.be/imaging/tiff/bigtiff.html) files) -- [NGFF/OME-ZARR](https://ngff.openmicroscopy.org/latest/) (`.ome.zarr` directories) +- [OME-ZARR/NGFF](https://ngff.openmicroscopy.org/latest/) (`.ome.zarr` directories) -If different from PNG, TIFF, OME-TIFF, or NGFF, the original unprocessed data in the native format MAY be +If different from PNG, TIFF, OME-TIFF, or OME-ZARR, the original unprocessed data in the native format MAY be stored in the [`/sourcedata` directory](../02-common-principles.md#source-vs-raw-vs-derived-data). ### Modality suffixes