-
Notifications
You must be signed in to change notification settings - Fork 169
/
02-common-principles.md
887 lines (750 loc) · 40 KB
/
02-common-principles.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
# Common principles
## Definitions
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [[RFC2119](https://www.ietf.org/rfc/rfc2119.txt)].
Throughout this specification we use a list of terms and abbreviations. To avoid
misunderstanding we clarify them here.
1. **Dataset** - a set of neuroimaging and behavioral data acquired for a
purpose of a particular study. A dataset consists of data acquired from one
or more subjects, possibly from multiple sessions.
1. **Subject** - a person or animal participating in the study. Used
interchangeably with term **Participant**.
1. **Session** - a logical grouping of neuroimaging and behavioral data
consistent across subjects. Session can (but doesn't have to) be synonymous
to a visit in a longitudinal study. In general, subjects will stay in the
scanner during one session. However, for example, if a subject has to leave
the scanner room and then be re-positioned on the scanner bed, the set of
MRI acquisitions will still be considered as a session and match sessions
acquired in other subjects. Similarly, in situations where different data
types are obtained over several visits (for example fMRI on one day followed
by DWI the day after) those can be grouped in one session. Defining multiple
sessions is appropriate when several identical or similar data acquisitions
are planned and performed on all -or most- subjects, often in the case of
some intervention between sessions (for example, training).
In the [PET](04-modality-specific-files/09-positron-emission-tomography.md)
context, a session may also indicate a group of related scans,
taken in one or more visits.
1. **Sample** - a sample pertaining to a subject such as tissue, primary cell
or cell-free sample.
The `sample-<label>` key/value pair is used to distinguish between different
samples from the same subject.
The label MUST be unique per subject and is RECOMMENDED to be unique
throughout the dataset.
1. **Data acquisition** - a continuous uninterrupted block of time during which
a brain scanning instrument was acquiring data according to particular
scanning sequence/protocol.
1. **Data type** - a functional group of different types of data.
BIDS defines the following data types:
1. `func` (task based and resting state functional MRI)
1. `dwi` (diffusion weighted imaging)
1. `fmap` (field inhomogeneity mapping data such as field maps)
1. `anat` (structural imaging such as T1, T2, PD, and so on)
1. `perf` (perfusion)
1. `meg` (magnetoencephalography)
1. `eeg` (electroencephalography)
1. `ieeg` (intracranial electroencephalography)
1. `beh` (behavioral)
1. `pet` (positron emission tomography)
1. `micr` (microscopy)
Data files are contained in a directory named for the data type.
In raw datasets, the data type directory is nested inside subject and
(optionally) session directories.
1. **Task** - a set of structured activities performed by the participant.
Tasks are usually accompanied by stimuli and responses, and can greatly vary
in complexity. For the purpose of this specification we consider the so-called
"resting state" a task. In the context of brain scanning, a task is always
tied to one data acquisition. Therefore, even if during one acquisition the
subject performed multiple conceptually different behaviors (with different
sets of instructions) they will be considered one (combined) task.
1. **Event** - something that happens or may be perceived by a test subject as happening
at a particular instant during the recording.
Events are most commonly associated with on- or offset of stimulus presentations,
or with the distinct marker of on- or offset of a subject's response or motor action.
Other events may include unplanned incidents
(for example, sudden onset of noise and vibrations due to construction work,
laboratory device malfunction),
changes in task instructions (for example, switching the response hand),
or experiment control parameters (for example,
changing the stimulus presentation rate over experimental blocks),
and noted data feature occurrences (for example, a recording electrode producing noise).
In BIDS, each event has an onset time and duration.
Note that not all tasks will have recorded events (for example, "resting state").
1. **Run** - an uninterrupted repetition of data acquisition that has the same
acquisition parameters and task (however events can change from run to run
due to different subject response or randomized nature of the stimuli). Run
is a synonym of a data acquisition.
Note that "uninterrupted" may look different by modality due to the nature of the
recording.
For example, in [MRI](04-modality-specific-files/01-magnetic-resonance-imaging-data.md)
or [MEG] (04-modality-specific-files/02-magnetoencephalography.md),
if a subject leaves the scanner, the acquisition must be restarted.
For some types of [PET](04-modality-specific-files/09-positron-emission-tomography.md) acquisitions,
a subject may leave and re-enter the scanner without interrupting the scan.
1. **Modality** - the category of brain data recorded by a file.
For MRI data, different pulse sequences are considered distinct modalities,
such as `T1w`, `bold` or `dwi`.
For passive recording techniques, such as EEG, MEG or iEEG,
the technique is sufficiently uniform to define the modalities `eeg`,
`meg` and `ieeg`.
When applicable, the modality is indicated in the **suffix**.
The modality may overlap with, but should not be confused with
the **data type**.
1. **`<index>`** - a nonnegative integer, possibly prefixed with arbitrary number of
0s for consistent indentation, for example, it is `01` in `run-01` following
`run-<index>` specification.
1. **`<label>`** - an alphanumeric value, possibly prefixed with arbitrary
number of 0s for consistent indentation, for example, it is `rest` in `task-rest`
following `task-<label>` specification. Note that labels MUST not collide when
casing is ignored (see [Case collision intolerance](#case-collision-intolerance)).
1. **`suffix`** - an alphanumeric value, located after the `key-value_` pairs (thus after
the final `_`), right before the **File extension**, for example, it is `eeg` in
`sub-05_task-matchingpennies_eeg.vhdr`.
1. **File extension** - a portion of the filename after the left-most
period (`.`) preceded by any other alphanumeric. For example, `.gitignore` does
not have a file extension, but the file extension of `test.nii.gz` is `.nii.gz`.
Note that the left-most period is included in the file extension.
1. **DEPRECATED** - A "deprecated" entity or metadata field SHOULD NOT be used in the
generation of new datasets.
It remains in the standard in order to preserve the interpretability of existing datasets.
Validating software SHOULD warn when deprecated practices are detected and provide a
suggestion for updating the dataset to preserve the curator's intent.
## Compulsory, optional, and additional data and metadata
The following standard describes a way of arranging data and writing down
metadata for a subset of neuroimaging experiments. Some aspects of the standard
are compulsory. For example a particular filename format is required when
storing structural scans. Some aspects are regulated but optional. For example a
T2 volume does not need to be included, but when it is available it should be
saved under a particular filename specified in the standard. This standard
aspires to describe a majority of datasets, but acknowledges that there will be
cases that do not fit. In such cases one can include additional files and
subfolders to the existing folder structure following common sense. For example
one may want to include eye tracking data in a vendor specific format that is
not covered by this standard. The most sensible place to put it is next to the
continuous recording file with the same naming scheme but different extensions.
The solutions will change from case to case and publicly available datasets will
be reviewed to include common data types in the future releases of the BIDS
specification.
## File name structure
A filename consists of a chain of *entities*, or key-value pairs, a *suffix* and an
*extension*.
Two prominent examples of entities are `subject` and `session`.
For a data file that was collected in a given `session` from a given
`subject`, the filename MUST begin with the string `sub-<label>_ses-<label>`.
If the `session` level is omitted in the folder structure, the filename MUST begin
with the string `sub-<label>`, without `ses-<label>`.
Note that `sub-<label>` corresponds to the `subject` entity because it has
the `sub-` "key" and`<label>` "value", where `<label>` would in a real data file
correspond to a unique identifier of that subject, such as `01`.
The same holds for the `session` entity with its `ses-` key and its `<label>`
value.
The extra session layer (at least one `/ses-<label>` subfolder) SHOULD
be added for all subjects if at least one subject in the dataset has more than
one session.
If a `/ses-<label>` subfolder is included as part of the directory hierarchy,
then the same [`ses-<label>`](./99-appendices/09-entities.md#ses)
key/value pair MUST also be included as part of the filenames themselves.
Acquisition time of session can
be defined in the [sessions file](03-modality-agnostic-files.md#sessions-file).
A chain of entities, followed by a suffix, connected by underscores (`_`)
produces a human readable filename, such as `sub-01_task-rest_eeg.edf`.
It is evident from the filename alone that the file contains resting state
data from subject `01`.
The suffix `eeg` and the extension `.edf` depend on the imaging modality and
the data format and indicate further details of the file's contents.
Entities within a filename MUST be unique.
For example, the following filename is not valid because it uses the `acq`
entity twice:
`sub-01_acq-laser_acq-uneven_electrodes.tsv`
In cases where entities duplicate metadata,
the presence of an entity should not be used as a replacement for
the corresponding metadata field.
For instance, in echo-planar imaging MRI,
the [`dir-<label>`](./99-appendices/09-entities.md#dir) entity MAY be used
to distinguish files with different phase-encoding directions,
but the file's `PhaseEncodingDirection` can only be specified as metadata.
A summary of all entities in BIDS and the order in which they MUST be
specified is available in the [entity table](./99-appendices/04-entity-table.md)
in the appendix.
### Entity-linked file collections
An entity-linked file collection is a set of files that are related to each other
based on a repetitive acquisition of sequential data
by changing acquisition parameters one (or multiple) at a time
or by being inherent components of the same data.
Entity-linked collections are identified by a common suffix,
indicating the group of files that should be considered a logical unit.
Within each collection, files MUST be distinguished from each other by at least one
entity (for example, `echo`) that corresponds to an altered acquisition parameter
(`EchoTime`) or that defines a component relationship (for example, `part`).
Note that these entities MUST be described by the specification and
the parameter changes they declare MUST NOT invalidate the definition of the accompanying suffix.
For example, the use of the `echo` entity along with the `T1w` suffix casts doubt on
the validity of the identified contrast weighting.
Provided the conditions above are satisfied,
any suffix (such as `bold`) can identify an entity-linked file collection,
although certain suffixes are exclusive for this purpose (for example, `MP2RAGE`).
Use cases concerning this convention are compiled in the
[file collections](./99-appendices/10-file-collections.md) appendix.
This convention is mainly intended for but not limited to MRI modalities.
### Case collision intolerance
File name components are case sensitive,
but collisions MUST be avoided when casing is ignored.
For example, a dataset cannot contain both `sub-s1` and `sub-S1`,
as the labels would collide on a case-insensitive filesystem.
Additionally, because the suffix `eeg` is defined,
then the suffix `EEG` will not be added to future versions of the standard.
## Source vs. raw vs. derived data
BIDS was originally designed to describe and apply consistent naming conventions
to raw (unprocessed or minimally processed due to file format conversion) data.
During analysis such data will be transformed and partial as well as final results
will be saved.
Derivatives of the raw data (other than products of DICOM to NIfTI conversion)
MUST be kept separate from the raw data. This way one can protect the raw data
from accidental changes by file permissions. In addition it is easy to
distinguish partial results from the raw data and share the latter.
See [Storage of derived datasets](#storage-of-derived-datasets) for more on
organizing derivatives.
Similar rules apply to source data, which is defined as data before
harmonization, reconstruction, and/or file format conversion (for example, E-Prime event logs or DICOM files).
Storing actual source files with the data is preferred over links to
external source repositories to maximize long term preservation,
which would suffer if an external repository would not be available anymore.
This specification currently does not go into the details of
recommending a particular naming scheme for including different types of
source data (such as the raw event logs or parameter files, before conversion to BIDS).
However, in the case that these data are to be included:
1. These data MUST be kept in separate `sourcedata` folder with a similar
folder structure as presented below for the BIDS-managed data. For example:
`sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
`sourcedata/sub-01/ses-pre/func/MyEvent.sce`.
1. A README file SHOULD be found at the root of the `sourcedata` folder or the
`derivatives` folder, or both.
This file should describe the nature of the raw data or the derived data.
We RECOMMEND including the PDF print-out with the actual sequence
parameters generated by the scanner in the `sourcedata` folder.
Alternatively one can organize their data in the following way
{{ MACROS___make_filetree_example(
{
"my_dataset-1": {
"sourcedata": "",
"...": "",
"sourcedata": {
"sub-01": {},
"sub-02": {},
"...": "",
},
"derivatives": {
"pipeline_1": {},
"pipeline_2": {},
"...": "",
},
}
}
) }}
In this example, where `sourcedata` and `derivatives` are not nested inside
`rawdata`, **only the `rawdata` subfolder** needs to be a BIDS-compliant
dataset.
The subfolders of `derivatives` MAY be BIDS-compliant derivatives datasets
(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion).
This specification does not prescribe anything about the contents of `sourcedata`
folders in the above example - nor does it prescribe the `sourcedata`,
`derivatives`, or `rawdata` folder names.
The above example is just a convention that can be useful for organizing raw,
source, and derived data while maintaining BIDS compliance of the raw data
folder. When using this convention it is RECOMMENDED to set the `SourceDatasets`
field in `dataset_description.json` of each subfolder of `derivatives` to:
```JSON
{
"SourceDatasets": [ {"URL": "file://../../rawdata/"} ]
}
```
### Storage of derived datasets
Derivatives can be stored/distributed in two ways:
1. Under a `derivatives/` subfolder in the root of the source BIDS dataset
folder to make a clear distinction between raw data and results of data
processing.
A data processing pipeline will typically have a dedicated directory
under which it stores all of its outputs.
Different components of a pipeline can, however, also be stored under different
subfolders.
There are few restrictions on the directory names;
it is RECOMMENDED to use the format `<pipeline>-<variant>` in cases where
it is anticipated that the same pipeline will output more than one variant
(for example, `AFNI-blurring` and `AFNI-noblurring`).
For the sake of consistency, the subfolder name SHOULD be
the `GeneratedBy.Name` field in `data_description.json`,
optionally followed by a hyphen and a suffix (see
[Derived dataset and pipeline description][derived-dataset-description]).
Example of derivatives with one directory per pipeline:
```Plain
<dataset>/derivatives/fmriprep-v1.4.1/sub-0001
<dataset>/derivatives/spm/sub-0001
<dataset>/derivatives/vbm/sub-0001
```
Example of a pipeline with split derivative directories:
```Plain
<dataset>/derivatives/spm-preproc/sub-0001
<dataset>/derivatives/spm-stats/sub-0001
```
Example of a pipeline with nested derivative directories:
```Plain
<dataset>/derivatives/spm-preproc/sub-0001
<dataset>/derivatives/spm-preproc/derivatives/spm-stats/sub-0001
```
1. As a standalone dataset independent of the source (raw or derived) BIDS
dataset.
This way of specifying derivatives is particularly useful when the source
dataset is provided with read-only access, for publishing derivatives as
independent bodies of work, or for describing derivatives that were created
from more than one source dataset.
The `sourcedata/` subdirectory MAY be used to include the source dataset(s)
that were used to generate the derivatives.
Likewise, any code used to generate the derivatives from the source data
MAY be included in the `code/` subdirectory.
Example of a derivative dataset including the raw dataset as source:
{{ MACROS___make_filetree_example(
{
"my_processed_data": {
"code": {
"processing_pipeline-1.0.0.img": "",
"hpc_submitter.sh": "",
"...": "",
},
"sourcedata": {
"sub-01": {},
"sub-02": {},
"...": "",
},
"sub-01": {},
"sub-02": {},
"...": "",
}
}
) }}
Throughout this specification, if a section applies particularly to derivatives,
then Case 1 will be assumed for clarity in templates and examples, but removing
`/derivatives/<pipeline>` from the template name will provide the equivalent for
Case 2.
In both cases, every derivatives dataset is considered a BIDS dataset and must
include a `dataset_description.json` file at the root level (see
[Dataset description][dataset-description]).
Consequently, files should be organized to comply with BIDS to the full extent
possible (that is, unless explicitly contradicted for derivatives).
Any subject-specific derivatives should be housed within each subject’s directory;
if session-specific derivatives are generated, they should be deposited under a
session subdirectory within the corresponding subject directory; and so on.
### Non-compliant derivatives
Nothing in this specification should be interpreted to disallow the
storage/distribution of non-compliant derivatives of BIDS datasets.
In particular, if a BIDS dataset contains a `derivatives/` sub-directory,
the contents of that directory may be a heterogeneous mix of BIDS Derivatives
datasets and non-compliant derivatives.
## File Formation specification
### Imaging files
All imaging data MUST be stored using the NIfTI file format. We RECOMMEND using
compressed NIfTI files (.nii.gz), either version 1.0 or 2.0. Imaging data SHOULD
be converted to the NIfTI format using a tool that provides as much of the NIfTI
header information (such as orientation and slice timing information) as
possible. Since the NIfTI standard offers limited support for the various image
acquisition parameters available in DICOM files, we RECOMMEND that users provide
additional meta information extracted from DICOM files in a sidecar JSON file
(with the same filename as the `.nii[.gz]` file, but with a `.json` extension).
Extraction of BIDS compatible metadata can be performed using [dcm2niix](https://github.com/rordenlab/dcm2niix)
and [dicm2nii](https://www.mathworks.com/matlabcentral/fileexchange/42997-xiangruili-dicm2nii)
DICOM to NIfTI converters. The [BIDS-validator](https://github.com/bids-standard/bids-validator)
will check for conflicts between the JSON file and the data recorded in the
NIfTI header.
### Tabular files
Tabular data MUST be saved as tab delimited values (`.tsv`) files, that is, CSV
files where commas are replaced by tabs. Tabs MUST be true tab characters and
MUST NOT be a series of space characters. Each TSV file MUST start with a header
line listing the names of all columns (with the exception of
[physiological and other continuous recordings](04-modality-specific-files/06-physiological-and-other-continuous-recordings.md)).
Names MUST be separated with tabs.
It is RECOMMENDED that the column names in the header of the TSV file are
written in [`snake_case`](https://en.wikipedia.org/wiki/Snake_case) with the
first letter in lower case (for example, `variable_name`, not `Variable_name`).
String values containing tabs MUST be escaped using double
quotes. Missing and non-applicable values MUST be coded as `n/a`. Numerical
values MUST employ the dot (`.`) as decimal separator and MAY be specified
in scientific notation, using `e` or `E` to separate the significand from the
exponent. TSV files MUST be in UTF-8 encoding.
Example:
```Text
onset duration response_time correct stop_trial go_trial
200 200 0 n/a n/a n/a
```
**Note**: The TSV examples in this document (like the one above this note)
are occasionally formatted using space characters instead of tabs to improve
human readability.
Directly copying and then pasting these examples from the specification
for use in new BIDS datasets can lead to errors and is discouraged.
Tabular files MAY be optionally accompanied by a simple data dictionary
in the form of a JSON [object](https://www.json.org/json-en.html)
within a JSON file.
The JSON files containing the data dictionaries MUST have the same name as
their corresponding tabular files but with `.json` extensions.
If a data dictionary is provided,
it MAY contain one or more fields describing the columns found in the TSV file
(in addition to any other metadata one wishes to include that describe the file as a whole).
Note that if a field name included in the data dictionary matches a column name in the TSV file,
then that field MUST contain a description of the corresponding column,
using an object containing the following fields:
{{ MACROS___make_metadata_table(
{
"LongName": "OPTIONAL",
"Description": (
"RECOMMENDED",
"The description of the column.",
),
"Levels": "RECOMMENDED",
"Units": "RECOMMENDED",
"TermURL": "RECOMMENDED",
"HED": "OPTIONAL",
}
) }}
Please note that while both `Units` and `Levels` are RECOMMENDED, typically only one
of these two fields would be specified for describing a single TSV file column.
Example:
```JSON
{
"test": {
"LongName": "Education level",
"Description": "Education level, self-rated by participant",
"Levels": {
"1": "Finished primary school",
"2": "Finished secondary school",
"3": "Student at university",
"4": "Has degree from university"
}
},
"bmi": {
"LongName": "Body mass index",
"Units": "kg/m^2",
"TermURL": "https://purl.bioontology.org/ontology/SNOMEDCT/60621009"
}
}
```
### Key/value files (dictionaries)
JavaScript Object Notation (JSON) files MUST be used for storing key/value
pairs. JSON files MUST be in UTF-8 encoding. Extensive documentation of the
format can be found at [https://www.json.org/](https://www.json.org/),
and at [https://tools.ietf.org/html/std90](https://tools.ietf.org/html/std90).
Several editors have built-in support for JSON syntax highlighting that aids
manual creation of such files.
An online editor for JSON with built-in validation is available at
[https://jsoneditoronline.org](https://jsoneditoronline.org).
It is RECOMMENDED that keys in a JSON file are written in [CamelCase](https://en.wikipedia.org/wiki/Camel_case)
with the first letter in upper case (for example, `SamplingFrequency`, not
`samplingFrequency`). Note however, when a JSON file is used as an accompanying
sidecar file for a [TSV file](#tabular-files), the keys linking a TSV column
with their description in the JSON file need to follow the exact formatting
as in the TSV file.
Example of a hypothetical `*_bold.json` file, accompanying a `*_bold.nii` file:
```JSON
{
"RepetitionTime": 3,
"Instruction": "Lie still and keep your eyes open"
}
```
Example of a hypothetical `*_events.json` file, accompanying an
`*_events.tsv` file. Note that the JSON file contains a key describing an
*arbitrary* column `stim_presentation_side` in the TSV file it accompanies.
See [task events section](04-modality-specific-files/05-task-events.md)
for more information.
```JSON
{
"stim_presentation_side": {
"Levels": {
"1": "stimulus presented on LEFT side",
"2": "stimulus presented on RIGHT side"
}
}
}
```
## The Inheritance Principle
1. Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any directory level.
1. For a given data file, any metadata file is applicable to that data file if:
1. It is stored at the same directory level or higher;
1. The metadata and the data filenames possess the same suffix;
1. The metadata filename does not include any entity absent from the data filename.
1. A metadata file MUST NOT have a filename that would be otherwise applicable
to some data file based on rules 2.b and 2.c but is made inapplicable based on its
location in the directory structure as per rule 2.a.
1. There MUST NOT be multiple metadata files applicable to a data file at one level
of the directory hierarchy.
1. If multiple metadata files satisfy criteria 2.a-c above:
1. For [tabular files](#tabular-files) and other simple metadata files
(for instance, [`bvec` / `bval` files for diffusion MRI](#04-modality-specific-files/01-magnetic-resonance-imaging#required-gradient-orientation-information)),
accessing metadata associated with a data file MUST consider only the
applicable file that is lowest in the filesystem hierarchy.
1. For [JSON files](#key-value-files-dictionaries), key-values are loaded
from files from the top of the directory hierarchy downwards, such that
key-values from the top level are inherited by all data files at lower
levels to which it is applicable unless overridden by a value for the
same key present in another metadata file at a lower level
(though it is RECOMMENDED to minimize the extent of such overrides).
Corollaries:
1. As per rule 3, metadata files applicable only to a specific participant / session
MUST be defined in or below the directory corresponding to that participant / session;
similarly, a metadata file that is applicable to multiple participants / sessions
MUST NOT be placed within a directory corresponding to only one such participant / session.
1. It is permissible for a single metadata file to be applicable to multiple data
files at that level of the hierarchy or below. Where such metadata content is consistent
across multiple data files, it is RECOMMENDED to store metadata in this
way, rather than duplicating that metadata content across multiple metadata files.
1. Where multiple applicable JSON files are loaded as per rule 5.b, key-values can
only be overwritten by files lower in the filesystem hierarchy; the absence of
a key-value in a later file does not imply the "unsetting" of that field
(indeed removal of existing fields is not possible).
Example 1: Demonstration of inheritance principle
{{ MACROS___make_filetree_example(
{
"sub-01": {
"func": {
"sub-01_task-rest_acq-default_bold.nii.gz": "",
"sub-01_task-rest_acq-longtr_bold.nii.gz": "",
"sub-01_task-rest_acq-longtr_bold.json": "",
}
},
"task-rest_bold.json": "",
}
) }}
Contents of file `task-rest_bold.json`:
```JSON
{
"EchoTime": 0.040,
"RepetitionTime": 1.0
}
```
Contents of file `sub-01/func/sub-01_task-rest_acq-longtr_bold.json`:
```JSON
{
"RepetitionTime": 3.0
}
```
When reading image `sub-01/func/sub-01_task-rest_acq-default_bold.nii.gz`, only
metadata file `task-rest_bold.json` is read; file
`sub-01/func/sub-01_task-rest_acq-longtr_bold.json` is inapplicable as it contains
entity "`acq-longtr`" that is absent from the image path (rule 2.c). When reading image
`sub-01/func/sub-01_task-rest_acq-longtr_bold.nii.gz`, metadata file
`task-rest_bold.json` at the top level is read first, followed by file
`sub-01/func/sub-01_task-rest_acq-longtr_bold.json` at the bottom level (rule 5.b);
the value for field "`RepetitionTime`" is therefore overridden to the value `3.0`.
The value for field "`EchoTime`" remains applicable to that image, and is not unset by its
absence in the metadata file at the lower level (rule 5.b; corollary 3).
Example 2: Impermissible use of multiple metadata files at one directory level (rule 4)
{{ MACROS___make_filetree_example(
{
"sub-01": {
"ses-test":{
"anat": {
"sub-01_ses-test_T1w.nii.gz": "",
},
"func": {
"sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_bold.json": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.json": "",
}
}
}
}
) }}
Example 3: Modification of filesystem structure from Example 2 to satisfy inheritance
principle requirements
{{ MACROS___make_filetree_example(
{
"sub-01": {
"ses-test":{
"sub-01_ses-test_task-overtverbgeneration_bold.json": "",
"anat": {
"sub-01_ses-test_T1w.nii.gz": "",
},
"func": {
"sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.json": "",
}
}
}
}
) }}
Example 4: Single metadata file applying to multiple data files (corollary 2)
{{ MACROS___make_filetree_example(
{
"sub-01": {
"anat": {},
"func": {
"sub-01_task-xyz_acq-test1_run-1_bold.nii.gz": "",
"sub-01_task-xyz_acq-test1_run-2_bold.nii.gz": "",
"sub-01_task-xyz_acq-test1_bold.json": "",
}
}
}
) }}
## Participant names and other labels
BIDS allows for custom user-defined `<label>`s and `<index>`es for example,
for naming of participants, sessions, acquisition schemes.
Note that they MUST consist only of allowed characters as described in
[Definitions](02-common-principles.md#definitions) above.
In `<index>`es we RECOMMEND using zero padding (for example, `01` instead of `1`
if you have more than nine subjects) to make alphabetical sorting more intuitive.
Note that zero padding SHOULD NOT be used to merely maintain uniqueness
of `<index>`es.
Please note that a given label or index is distinct from the "prefix"
it refers to. For example `sub-01` refers to the `sub` entity (a
subject) with the label `01`. The `sub-` prefix is not part of the subject
label, but must be included in filenames (similarly to other key names).
## Specification of paths
Several metadata fields in BIDS require the specification of paths,
that is, a string of characters used to uniquely identify a location in a directory structure.
For example the `IntendedFor` or `AssociatedEmptyroom` metadata fields.
Throughout BIDS all such paths MUST be specified using the slash character (`/`),
regardless of the operating system that a particular dataset is curated on or used on.
Paths SHOULD NOT be absolute local paths,
because these might break when a dataset is used on a different machine.
It is RECOMMENDED that all paths specified in a BIDS dataset are relative paths,
as specified in the respective descriptions of metadata fields that require the use of paths.
## Uniform Resource Indicator
A Uniform Resource Indicator (URI) is a string referring to a resource and SHOULD
have the form `<scheme>:[//<authority>]<path>[?<query>][#<fragment>]`, as specified
in [RFC 3986](https://tools.ietf.org/html/rfc3986).
This applies to URLs and other common URIs, including Digital Object Identifiers (DOIs),
which may be fully specified as `doi:<path>`,
for example, [doi:10.5281/zenodo.3686061](https://doi.org/10.5281/zenodo.3686061).
A given resource may have multiple URIs.
When selecting URIs to add to dataset metadata, it is important to consider
specificity and persistence.
Several fields are designated for DOIs, for example, `DatasetDOI` in `dataset_description.json`.
DOI values SHOULD be fully specified URIs such as `doi:10.18112/openneuro.ds000001.v1.0.0`.
Bare DOIs such as `10.18112/openneuro.ds000001.v1.0.0` are [DEPRECATED][deprecated].
## Units
All units SHOULD be specified as per [International System of Units](https://en.wikipedia.org/wiki/International_System_of_Units)
(abbreviated as SI, from the French Système international (d'unités)) and can
be SI units or [SI derived units](https://en.wikipedia.org/wiki/SI_derived_unit).
In case there are valid reasons to deviate from SI units or SI derived units,
the units MUST be specified in the sidecar JSON file.
In case data is expressed in SI units or SI derived units, the units MAY be
specified in the sidecar JSON file.
In case non-standard prefixes are added to SI or non-SI units, these
non-standard prefixed units MUST be specified in the JSON file.
See [Appendix V](99-appendices/05-units.md) for a list of standard units and
prefixes.
Note also that for the *formatting* of SI units, the [CMIXF-12](https://people.csail.mit.edu/jaffer/MIXF/CMIXF-12)
convention for encoding units is RECOMMENDED.
CMIXF provides a consistent system for all units and prefix symbols with only basic
characters, avoiding symbols that can cause text encoding problems; for example the
CMIXF formatting for "micro volts" is `uV`, "degrees Celsius" is `oC` and "Ohm" is `Ohm`.
See [Appendix V](99-appendices/05-units.md) for more information.
For additional rules, see below:
- Elapsed time SHOULD be expressed in seconds. Please note that some DICOM
parameters have been traditionally expressed in milliseconds. Those need to
be converted to seconds.
- Frequency SHOULD be expressed in Hertz.
- Arbitrary units SHOULD be indicated with the string `"arbitrary"`.
Describing dates and timestamps:
- Date time information MUST be expressed in the following format
`YYYY-MM-DDThh:mm:ss[.000000][Z]` (year, month, day, hour (24h), minute,
second, optional fractional seconds, and optional UTC time indicator).
This is almost equivalent to the [RFC3339](https://tools.ietf.org/html/rfc3339)
"date-time" format, with the exception that UTC indicator `Z` is optional and
non-zero UTC offsets are not indicated.
If `Z` is not indicated, time zone is always assumed to be the local time of the
dataset viewer.
No specific precision is required for fractional seconds, but the precision
SHOULD be consistent across the dataset.
For example `2009-06-15T13:45:30`.
- Time stamp information MUST be expressed in the following format:
`hh:mm:ss[.000000]`
For example `13:45:30`.
- Note that, depending on local ethics board policy, date time information may not
need to be fully detailed.
For example, it is permissible to set the time to `00:00:00` if reporting the
exact recording time is undesirable.
However, for privacy protection reasons, it is RECOMMENDED to shift dates, as
described below, without completely removing time information, as time information
can be useful for research purposes.
- Dates can be shifted by a random number of days for privacy protection
reasons.
To distinguish real dates from shifted dates,
is is RECOMMENDED to set shifted dates to the year 1925 or earlier.
Note that some data formats do not support arbitrary recording dates.
For example, the [EDF](https://www.edfplus.info/)
data format can only contain recording dates after 1985.
For longitudinal studies dates MUST be shifted by the same number of days
within each subject to maintain the interval information.
For example: `1867-06-15T13:45:30`
- WARNING: The Neuromag/Elekta/MEGIN file format for MEG (`.fif`) does *not*
support recording dates earlier than `1902` roughly.
Some analysis software packages (for example, MNE-Python) handle their data as `.fif`
internally and will break if recording dates are specified prior to `1902`,
even if the original data format is not `.fif`.
See [MEG-file-formats](./99-appendices/06-meg-file-formats.md#recording-dates-in-fif-files)
for more information.
- Age SHOULD be given as the number of years since birth at the time of
scanning (or first scan in case of multi session datasets). Using higher
accuracy (weeks) should in general be avoided due to privacy protection,
unless when appropriate given the study goals, for example, when scanning babies.
## Directory structure
### Single session example
This is an example of the folder and file structure. Because there is only one
session, the session level is not required by the format. For details on
individual files see descriptions in the next section:
{{ MACROS___make_filetree_example(
{
"sub-control01": {
"anat":{
"sub-control01_T1w.nii.gz": "",
"sub-control01_T1w.json": "",
"sub-control01_T2w.nii.gz": "",
"sub-control01_T2w.json": "",
},
"func":{
"sub-control01_task-nback_bold.nii.gz": "",
"sub-control01_task-nback_bold.json": "",
"sub-control01_task-nback_events.tsv": "",
"sub-control01_task-nback_physio.tsv.gz": "",
"sub-control01_task-nback_physio.json": "",
"sub-control01_task-nback_sbref.nii.gz": "",
},
"dwi":{
"sub-control01_dwi.nii.gz": "",
"sub-control01_dwi.bval": "",
"sub-control01_dwi.bvec": "",
},
"fmap":{
"sub-control01_phasediff.nii.gz": "",
"sub-control01_phasediff.json": "",
"sub-control01_magnitude1.nii.gz": "",
}
},
"code": {
"deface.py": ""
},
"derivatives": {},
"README": "",
"participants.tsv": "",
"dataset_description.json": "",
"CHANGES": "",
}
) }}
## Unspecified data
Additional files and folders containing raw data MAY be added as needed for
special cases.
All non-standard file entities SHOULD conform to BIDS-style naming conventions, including
alphabetic entities and suffixes and alphanumeric labels/indices.
Non-standard suffixes SHOULD reflect the nature of the data, and existing
entities SHOULD be used when appropriate.
For example, an ASSET calibration scan might be named
`sub-01_acq-ASSET_calibration.nii.gz`.
Non-standard files and directories should be named with care.
Future BIDS efforts may standardize new entities and suffixes, changing the
meaning of filenames and setting requirements on their contents or metadata.
Validation and parsing tools MAY treat the presence of non-standard files and
directories as an error, so consult the details of these tools for mechanisms
to suppress warnings or provide interpretations of your filenames.
<!-- Link Definitions -->
[dataset-description]: 03-modality-agnostic-files.md#dataset-description
[derived-dataset-description]: 03-modality-agnostic-files.md#derived-dataset-and-pipeline-description
[deprecated]: ./02-common-principles.md#definitions