Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add Glossary of terms/abbreviations used in the specification #152

Merged
merged 10 commits into from
May 23, 2020
92 changes: 54 additions & 38 deletions src/02-common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,56 +6,72 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [[RFC2119](https://www.ietf.org/rfc/rfc2119.txt)].

Throughout this specification we use a list of terms. To avoid
Throughout this specification we use a list of terms and abbreviations. To avoid
misunderstanding we clarify them here.

1. Dataset - a set of neuroimaging and behavioral data acquired for a purpose
of a particular study. A dataset consists of data acquired from one or more
subjects, possibly from multiple sessions.
1. **Dataset** - a set of neuroimaging and behavioral data acquired for a
purpose of a particular study. A dataset consists of data acquired from one
or more subjects, possibly from multiple sessions.

1. Subject - a person or animal participating in the study.
1. **Subject** - a person or animal participating in the study. Used
interchangeably with term **Participant**.

1. Session - a logical grouping of neuroimaging and behavioral data consistent
across subjects. Session can (but doesn't have to) be synonymous to a visit
in a longitudinal study. In general, subjects will stay in the scanner
during one session. However, for example, if a subject has to leave the
scanner room and then be re-positioned on the scanner bed, the set of MRI
acquisitions will still be considered as a session and match sessions
1. **Session** - a logical grouping of neuroimaging and behavioral data
consistent across subjects. Session can (but doesn't have to) be synonymous
to a visit in a longitudinal study. In general, subjects will stay in the
scanner during one session. However, for example, if a subject has to leave
the scanner room and then be re-positioned on the scanner bed, the set of
MRI acquisitions will still be considered as a session and match sessions
acquired in other subjects. Similarly, in situations where different data
types are obtained over several visits (for example fMRI on one day followed
by DWI the day after) those can be grouped in one session. Defining multiple
sessions is appropriate when several identical or similar data acquisitions
are planned and performed on all -or most- subjects, often in the case of
some intervention between sessions (e.g., training).

1. Data acquisition - a continuous uninterrupted block of time during which a
brain scanning instrument was acquiring data according to particular
1. **Data acquisition** - a continuous uninterrupted block of time during which
a brain scanning instrument was acquiring data according to particular
scanning sequence/protocol.

1. Data type - a functional group of different types of data. In BIDS we define
eight data types: func (task based and resting state functional MRI), dwi
(diffusion weighted imaging), fmap (field inhomogeneity mapping data such as
field maps), anat (structural imaging such as T1, T2, etc.), meg
(magnetoencephalography), eeg (electroencephalography), ieeg (intracranial
electroencephalography), beh (behavioral).
1. **Data type** - a functional group of different types of data. In BIDS we
define eight data types: `func` (task based and resting state functional MRI),
`dwi` (diffusion weighted imaging), `fmap` (field inhomogeneity mapping data
such as field maps), `anat` (structural imaging such as T1, T2, etc.), `meg`
(magnetoencephalography), `eeg` (electroencephalography), `ieeg` (intracranial
electroencephalography), `beh` (behavioral).

1. Task - a set of structured activities performed by the participant. Tasks
are usually accompanied by stimuli and responses, and can greatly vary in
complexity. For the purpose of this specification we consider the so-called
1. **Task** - a set of structured activities performed by the participant.
Tasks are usually accompanied by stimuli and responses, and can greatly vary
in complexity. For the purpose of this specification we consider the so-called
"resting state" a task. In the context of brain scanning, a task is always
tied to one data acquisition. Therefore, even if during one acquisition the
subject performed multiple conceptually different behaviors (with different
sets of instructions) they will be considered one (combined) task.

1. Event - a stimulus or subject response recorded during a task. Each event
has an onset time and duration. Note that not all tasks will have recorded
events (e.g., resting state).
1. **Event** - a stimulus or subject response recorded during a task. Each
event has an onset time and duration. Note that not all tasks will have
recorded events (e.g., resting state).

1. Run - an uninterrupted repetition of data acquisition that has the same
1. **Run** - an uninterrupted repetition of data acquisition that has the same
acquisition parameters and task (however events can change from run to run
due to different subject response or randomized nature of the stimuli). Run
is a synonym of a data acquisition.

1. **`<index>`** - a numeric value, possibly prefixed with arbitrary number of
0s for consistent indentation, e.g., it is `01` in `run-01` following
`run-<index>` specification.

1. **`<label>`** - an alphanumeric value, possibly prefixed with arbitrary
number of 0s for consistent indentation, e.g., it is `rest` in `task-rest`
following `task-<label>` specification.

1. **`suffix`** - a portion of the file name with `key-value_` pairs (thus after
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved
the final `_`), right before the **File extension**.
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved

1. **File extension** - a portion of the the file name after the left-most
period (`.`) preceded by any other alphanumeric (so `.gitignore` does not have a
suffix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean the right-most period? Also, the .gitignore example is not clear to me: Why is the suffix of relevance when describing what a file extension is?

Copy link
Collaborator

@effigies effigies May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left-most is correct. Following the right-most rule for X.nii.gz would get you .gz, not .nii.gz.

By this definition, .func.gii, .surf.gii and .dtseries.nii are extensions, which I think is appropriate, but we should make sure we're on the same page here, as these are part of derivatives.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re .gitignore -- just an example where there is a left-most period but the gitignore is not an extension because is not preceded by anything. May be I should replace it with .bidsignore, which is also not a part of the spec but more relevant here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@effigies you bring up the point either period itself is a part of file extension, i.e. either it is .nii.gz or nii.gz?!

  • in my wording above it is without ("... after ...") so might need to be fixed.
  • Python seems to also retain period in splitext:
$> python -c 'from os.path import splitext; print(splitext("bla.nii.gz"))'
('bla.nii', '.gz')
  • text in bids-specification ATM seems to list them with period (IMHO makes it easier to parse when period is there)
  • pybids seems to ask for them without period (might be my influence attn: @tyarkoni :
(git)hopa:~/proj/bids/pybids[sqlalchemy]git
$> git grep extension.*nii
bids/layout/README.md:>>> files = layout.get(subject='0[12]', run=1, extension='.nii.gz')
bids/layout/README.md:In the above snippet, we retrieve all files with subject id 1 or 2 and run id 1 (notice that any entity defined in the config file can be used a filtering argument), and with a file extension of .nii.gz. The returned result is a list of named tuples, one per file, allowing direct access to the defined entities as attributes.
bids/layout/config/bids.json:        "sub-{subject}[/ses-{session}]/dwi/sub-{subject}[_ses-{session}][_acq-{acquisition}]_{suffix<dwi>}{extension<bval|bvec|json|nii\\.gz|nii>|nii\\.gz}",
bids/layout/index.py:                    extension=['nii', 'nii.gz'], suffix='bold',
bids/layout/index.py:                    extension=['nii', 'nii.gz'], suffix='dwi',
bids/layout/layout.py:                         extension=['nii.gz', 'nii'])
bids/layout/layout.py:        images = self.get(extension=['nii', 'nii.gz'], scope=scope,
bids/layout/tests/test_layout.py:              'extension': 'nii.gz'}
bids/layout/tests/test_layout.py:              'desc': 'bleargh', 'extension': 'nii.gz'}
bids/layout/tests/test_layout.py:              'extension': 'nii.gz'}
bids/layout/tests/test_layout.py:              'desc': 'bleargh', 'extension': 'nii.gz'}
bids/layout/tests/test_layout.py:                            acquisition='fullbrain', extension='nii.gz')[0]
bids/reports/parsing.py:            iff_file = [f for f in layout.get(extension='nii.gz') if fn in f.path][0]
bids/reports/parsing.py:                    echos = layout.get_echoes(subject=subj, extension='nii.gz',
bids/reports/parsing.py:                                                     extension='nii.gz',
bids/reports/report.py:            niftis = self.layout.get(subject=subject, extension='nii.gz',
bids/reports/tests/test_parsing.py:    niftis = testlayout.get(subject=subj, extension='nii.gz')
bids/variables/io.py:    images = layout.get(return_type='object', extension='nii.gz',
doc/layout/index.rst:    >>> f = layout.get(task='nback', run=1, extension='nii.gz')[0].filename
doc/layout/index.rst:    >>> f = layout.get(task='nback', run=1, extension='nii.gz')[0].filename
examples/pybids tutorial.ipynb:    "layout.get(subject='01', extension='nii.gz', suffix='bold', return_type='filename')"
examples/pybids tutorial.ipynb:       "{'subject': '01', 'run': 1, 'suffix': 'T2w', 'extension': 'nii.gz'}"
1 11544.....................................:Fri 10 May 2019 08:53:36 AM EDT:.
(git)hopa:~/proj/bids/pybids[sqlalchemy]git
$> git describe
0.7.1-256-g52c6a7f

so what should it be? ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the example should be "so sub-01_task-resting_bold.nii.gz has the suffix .nii.gz"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suffix should be bold. Extension nii.gz in current wording.

NB it has been awhile for this issue/PR, I might be loosing grip on what should be what now ;)

yarikoptic marked this conversation as resolved.
Show resolved Hide resolved

## Compulsory, optional, and additional data and metadata

The following standard describes a way of arranging data and writing down
Expand Down Expand Up @@ -378,18 +394,18 @@ for more information.

## Participant names and other labels

BIDS uses custom user-defined labels in several situations (naming of
participants, sessions, acquisition schemes, etc.) Labels are strings and MUST
only consist of letters (lower or upper case) and/or numbers. If numbers are
used we RECOMMEND zero padding (e.g., `01` instead of `1` if you have more than
nine subjects) to make alphabetical sorting more intuitive.

Please note that a given label is distinct from the "prefix" it refers to. For
example `sub-01` refers to the `sub` entity (a subject) with the label `01`.
The `sub-` prefix is not part of the subject label, but must be included in file
names (similarly to other key names). In contrast to other labels, `run` and
`echo` labels MUST be integers. Those labels MAY include zero padding, but this
is NOT RECOMMENDED to maintain their uniqueness.
BIDS allows for custom user-defined `<label>`s and `<index>`es e.g.,
for naming of participants, sessions, acquisition schemes, etc. Note
that they MUST consist only of allowed characters as described in
[Definitions](02-common-principles.md#definitions) above. In `<index>`es
we RECOMMEND using zero padding (e.g., `01` instead of `1` if you have more than
nine subjects) to make alphabetical sorting more intuitive. Note that
zero padding is NOT RECOMMENDED to maintain their uniqueness.
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved

Please note that a given label or index is distinct from the "prefix"
it refers to. For example `sub-01` refers to the `sub` entity (a
subject) with the label `01`. The `sub-` prefix is not part of the subject
label, but must be included in file names (similarly to other key names).

## Units

Expand Down