Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEP032 (animal-ephys): support for patchclamp data sets #1375

Open
SylvainTakerkart opened this issue Jan 5, 2023 · 5 comments
Open

BEP032 (animal-ephys): support for patchclamp data sets #1375

SylvainTakerkart opened this issue Jan 5, 2023 · 5 comments

Comments

@SylvainTakerkart
Copy link

Hi everyone,

I'm starting this issue as a place for discussion (and hopefully decision-making ;) ) for a particular point that was raised in our last BEP032-dedicated meeting, where we (BEP032-leads @JuliaSprenger and myself) described, for the first time, a proposition to support intracellular electrophysiological data. I'll make a quick statement about patchclamp data sets, describe the proposition, and then the issue that was raised by some in our group... As a primer, please be aware that the BEP suggests to use the NWB or NIX data formats to store animal ephys data in BIDS, both being HDF5-based formats.

  1. a patchclamp data set, in its raw form, is composed of several recordings, usually stored in a proprietary format with one file per recording; you then get several data files for a given cell that was patched...

  2. the proposition: keep the file organization as they are structured in their raw form, with one data file (NWB or NIX) for each recording; the metadata describing the structure of the experiment (the logical links between the different recordings) is therefore solely in the BIDS sidecar files; this is what's described in the current version of the BEP (https://bids.neuroimaging.io/bep032), e.g in the patch toy example provided in section 4.1

  3. the potential issue that was raised: this proposition does not exploit the ability of NWB and NIX to store multiple recordings (or multiple blocks of data) within a single file; this would be the usual way to use NWB or NIX files to store such data sets outside of BIDS, with the metadata that describe the structure of the experiment being embedded inside the NWB/NIX files; some therefore suggested that it would be nicer to exploit this ability so that to keep the usages of NWB/NIX files identical whether inside or outside of BIDS;

I therefore ping several people, some were there during our meeting, some not, to complement what I just explained and give their opinions... @JuliaSprenger @robertoostenveld @Remi-Gau @lzehl @apdavison @bendichter @oruebel @twachtler Others are obviously welcome to react!

The goal is to make progress here before our next meeting... Thanks in advance for everyone's contributions!

@lzehl
Copy link

lzehl commented Jan 9, 2023

I think both ways are valuable ways of storing data.

The way of handling the BIDS traditional way of storing data (one file per session or run) is clear I think (1-file+1-sidecar per session or run or recording).

When files contain more than one recording I suggest to state the navigation of how to find the data of a particular recording within the file format in the sidecar file. There are then again two ways how this could work: keep one sidecar file per recording (meaning multiple sidecar files per data file) or also consolidate here and merge them into one sidecar per data file.

There is also the question what the convention should be in stating the location within the given file format. @twachtler and @bendichter (or someone else) could you provide an example of the structure (how to reference) a single recording within a NWB and a NIX file?

@twachtler
Copy link

I agree and would suggest that this might be an issue to consider not just for patch clamp but for other kinds of recordings as well.

A straightforward way to reference an individual recording within a file would be by the uuid of the entity representing this recording.

The uuid could also be used to distinguish the sidecar files if multiple sidecar files are used.

@oruebel
Copy link

oruebel commented Jan 18, 2023

A straightforward way to reference an individual recording within a file would be by the uuid of the entity representing this recording.

For referencing entire recordings, using the UUID is a reasonable approach for NWB files as well.

I agree and would suggest that this might be an issue to consider not just for patch clamp but for other kinds of recordings as well.

I agree. A key reason why this is prominent in intracellular ephys is that the individual recordings are part of a larger experimental design and are as such typically acquired and analyzed in a larger context. For example, even in the simple case where a user measures how the response of a cell changes to a simple stimulus (e.g., a square pulse with varying amplitude), the resulting data consists of a large collection of recordings of stimuli and responses. In practice, these experiments typically exhibit a hierarchical organization in time consisting of: 1) pairs of stimulus and response time series recorded from a cell, 2) simultaneous recordings from multiple electrodes or cells (a.k.a. a sweep), 3) recordings performed sequentially in time, e.g., groups of recordings that use the same type of stimulus with varying parameters (a.k.a, a sweep sequence), 4) repetitions of the same experiment (a.k.a., a run), and 5) experimental conditions, e.g., when varying environmental conditions, e.g., temperature, during the experiment. Describing this organization of recordings and associated metadata at the different levels is critical because analysis of this kind of data typically needs to relate many recordings with each other (rather than investigating recordings individually). This is one key reason why most intracellular ephys files in NWB contain a large number of time series recordings (often 100s). To describe the experimental organization and metadata, NWB uses a collection of linked tables to group recordings in time (See https://pynwb.readthedocs.io/en/stable/tutorials/domain/plot_icephys.html#intracellular-electrophysiology ). Another aspect is that the individual recordings are typically small in this case, so that it is often more convenient to store all recordings from a recording session in a single file, rather than as 100s of individual small files.

The way of handling the BIDS traditional way of storing data (one file per session or run) is clear I think (1-file+1-sidecar per session or run or recording).

When storing recordings across many files, it will be important to represent how the recordings relate to each other in the experimental design. This could be done in a single sidecar file or via multiple sidecar files representing different parts of the experimental design (e.g, one sidecar file per stimulus/response pair, sweeps, runs, etc.)

There are then again two ways how this could work: keep one sidecar file per recording (meaning multiple sidecar files per data file) or also consolidate here and merge them into one sidecar per data file.

To represent the organization of the recordings (and enable users to relate the different recordings with each other and analyze them in the larger context of the experiment), I think some form of a common sidecare file that relates all the recordings with each other is likely needed. Ultimately, I think if one can reference individual recordings in a file, then the design of the sidecare files for the different cases (one-file-per-recording or multiple-recordings-per-file) may not need to change. I.e., in either case, a design where there are individual side-car files for each recording or one (or mutltiple) sidecare files that describe metadata about multiple recordings, are all feasible.

@yarikoptic
Copy link
Collaborator

Backref: The issue of multiple simultaneous recordings is not alien to neuroimaging as well, and discussions happened/ongoing, e.g. see #86 where the idea (if not consensus) so far is to rely on timing recorded in scans.tsv files.

@oruebel
Copy link

oruebel commented Jan 18, 2023

where the idea (if not consensus) so far is to rely on timing recorded in scans.tsv files.

I don't think relying on timing alone is sufficient in this case. E.g., we may have many responses and stimuli recorded at the same time. While it is technically possible to reconstruct which belong together based on metadata, it is not trivial and may require knowledge of the experiment design to do. This gets even more complicated for higher-level organization, e.g., sequential recordings and repetitions, where the logical design (e.g., based on type of stimuli and experiment parameter) is even more important for defining the grouping of recordings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants