BEP032 (animal-ephys): support for patchclamp data sets #1375

SylvainTakerkart · 2023-01-05T15:13:33Z

Hi everyone,

I'm starting this issue as a place for discussion (and hopefully decision-making ;) ) for a particular point that was raised in our last BEP032-dedicated meeting, where we (BEP032-leads @JuliaSprenger and myself) described, for the first time, a proposition to support intracellular electrophysiological data. I'll make a quick statement about patchclamp data sets, describe the proposition, and then the issue that was raised by some in our group... As a primer, please be aware that the BEP suggests to use the NWB or NIX data formats to store animal ephys data in BIDS, both being HDF5-based formats.

a patchclamp data set, in its raw form, is composed of several recordings, usually stored in a proprietary format with one file per recording; you then get several data files for a given cell that was patched...
the proposition: keep the file organization as they are structured in their raw form, with one data file (NWB or NIX) for each recording; the metadata describing the structure of the experiment (the logical links between the different recordings) is therefore solely in the BIDS sidecar files; this is what's described in the current version of the BEP (https://bids.neuroimaging.io/bep032), e.g in the patch toy example provided in section 4.1
the potential issue that was raised: this proposition does not exploit the ability of NWB and NIX to store multiple recordings (or multiple blocks of data) within a single file; this would be the usual way to use NWB or NIX files to store such data sets outside of BIDS, with the metadata that describe the structure of the experiment being embedded inside the NWB/NIX files; some therefore suggested that it would be nicer to exploit this ability so that to keep the usages of NWB/NIX files identical whether inside or outside of BIDS;

I therefore ping several people, some were there during our meeting, some not, to complement what I just explained and give their opinions... @JuliaSprenger @robertoostenveld @Remi-Gau @lzehl @apdavison @bendichter @oruebel @twachtler Others are obviously welcome to react!

The goal is to make progress here before our next meeting... Thanks in advance for everyone's contributions!

lzehl · 2023-01-09T14:10:50Z

I think both ways are valuable ways of storing data.

The way of handling the BIDS traditional way of storing data (one file per session or run) is clear I think (1-file+1-sidecar per session or run or recording).

When files contain more than one recording I suggest to state the navigation of how to find the data of a particular recording within the file format in the sidecar file. There are then again two ways how this could work: keep one sidecar file per recording (meaning multiple sidecar files per data file) or also consolidate here and merge them into one sidecar per data file.

There is also the question what the convention should be in stating the location within the given file format. @twachtler and @bendichter (or someone else) could you provide an example of the structure (how to reference) a single recording within a NWB and a NIX file?

twachtler · 2023-01-09T15:25:28Z

I agree and would suggest that this might be an issue to consider not just for patch clamp but for other kinds of recordings as well.

A straightforward way to reference an individual recording within a file would be by the uuid of the entity representing this recording.

The uuid could also be used to distinguish the sidecar files if multiple sidecar files are used.

oruebel · 2023-01-18T13:00:31Z

A straightforward way to reference an individual recording within a file would be by the uuid of the entity representing this recording.

For referencing entire recordings, using the UUID is a reasonable approach for NWB files as well.

I agree and would suggest that this might be an issue to consider not just for patch clamp but for other kinds of recordings as well.

I agree. A key reason why this is prominent in intracellular ephys is that the individual recordings are part of a larger experimental design and are as such typically acquired and analyzed in a larger context. For example, even in the simple case where a user measures how the response of a cell changes to a simple stimulus (e.g., a square pulse with varying amplitude), the resulting data consists of a large collection of recordings of stimuli and responses. In practice, these experiments typically exhibit a hierarchical organization in time consisting of: 1) pairs of stimulus and response time series recorded from a cell, 2) simultaneous recordings from multiple electrodes or cells (a.k.a. a sweep), 3) recordings performed sequentially in time, e.g., groups of recordings that use the same type of stimulus with varying parameters (a.k.a, a sweep sequence), 4) repetitions of the same experiment (a.k.a., a run), and 5) experimental conditions, e.g., when varying environmental conditions, e.g., temperature, during the experiment. Describing this organization of recordings and associated metadata at the different levels is critical because analysis of this kind of data typically needs to relate many recordings with each other (rather than investigating recordings individually). This is one key reason why most intracellular ephys files in NWB contain a large number of time series recordings (often 100s). To describe the experimental organization and metadata, NWB uses a collection of linked tables to group recordings in time (See https://pynwb.readthedocs.io/en/stable/tutorials/domain/plot_icephys.html#intracellular-electrophysiology ). Another aspect is that the individual recordings are typically small in this case, so that it is often more convenient to store all recordings from a recording session in a single file, rather than as 100s of individual small files.

The way of handling the BIDS traditional way of storing data (one file per session or run) is clear I think (1-file+1-sidecar per session or run or recording).

When storing recordings across many files, it will be important to represent how the recordings relate to each other in the experimental design. This could be done in a single sidecar file or via multiple sidecar files representing different parts of the experimental design (e.g, one sidecar file per stimulus/response pair, sweeps, runs, etc.)

There are then again two ways how this could work: keep one sidecar file per recording (meaning multiple sidecar files per data file) or also consolidate here and merge them into one sidecar per data file.

To represent the organization of the recordings (and enable users to relate the different recordings with each other and analyze them in the larger context of the experiment), I think some form of a common sidecare file that relates all the recordings with each other is likely needed. Ultimately, I think if one can reference individual recordings in a file, then the design of the sidecare files for the different cases (one-file-per-recording or multiple-recordings-per-file) may not need to change. I.e., in either case, a design where there are individual side-car files for each recording or one (or mutltiple) sidecare files that describe metadata about multiple recordings, are all feasible.

yarikoptic · 2023-01-18T14:14:11Z

Backref: The issue of multiple simultaneous recordings is not alien to neuroimaging as well, and discussions happened/ongoing, e.g. see #86 where the idea (if not consensus) so far is to rely on timing recorded in scans.tsv files.

oruebel · 2023-01-18T15:04:07Z

where the idea (if not consensus) so far is to rely on timing recorded in scans.tsv files.

I don't think relying on timing alone is sufficient in this case. E.g., we may have many responses and stimuli recorded at the same time. While it is technically possible to reconstruct which belong together based on metadata, it is not trivial and may require knowledge of the experiment design to do. This gets even more complicated for higher-level organization, e.g., sequential recordings and repetitions, where the logical design (e.g., based on type of stimuli and experiment parameter) is even more important for defining the grouping of recordings.

sappelhoff added the BEP label Mar 16, 2023

yarikoptic mentioned this issue Feb 14, 2024

[ENH] microelectrode electrophysiology specification (BEP032) #1705

Open

12 tasks

Remi-Gau added the raw label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEP032 (animal-ephys): support for patchclamp data sets #1375

BEP032 (animal-ephys): support for patchclamp data sets #1375

SylvainTakerkart commented Jan 5, 2023

lzehl commented Jan 9, 2023

twachtler commented Jan 9, 2023

oruebel commented Jan 18, 2023

yarikoptic commented Jan 18, 2023

oruebel commented Jan 18, 2023

BEP032 (animal-ephys): support for patchclamp data sets #1375

BEP032 (animal-ephys): support for patchclamp data sets #1375

Comments

SylvainTakerkart commented Jan 5, 2023

lzehl commented Jan 9, 2023

twachtler commented Jan 9, 2023

oruebel commented Jan 18, 2023

yarikoptic commented Jan 18, 2023

oruebel commented Jan 18, 2023