Skip to content

sccn/EEGDash

Repository files navigation

EEG-Dash

To leverage recent and ongoing advancements in large-scale computational methods and to ensure the preservation of scientific data generated from publicly funded research, the EEG-DaSh data archive will create a data-sharing resource for MEEG (EEG, MEG) data contributed by collaborators for machine learning (ML) and deep learning (DL) applications.

Data source

The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs, involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks. In addition, EEG-DaSh will incorporate a subset of the data converted from NEMAR, which includes 330 MEEG BIDS-formatted datasets, further expanding the archive with well-curated, standardized neuroelectromagnetic data.

Available data

The following datasets are currently available on EEGDash.

DatasetID Participants Files Sessions Population Channels Is 10-20? Modality Size
ds002181 20 949 1 Healthy 63 10-20 Visual 0.163 GB
ds002578 2 22 1 Healthy 256 10-20 Visual 0.001 TB
ds002680 14 4977 2 Healthy 0 10-20 Visual 0.01 TB
ds002691 20 146 1 Healthy 32 other Visual 0.001 TB
ds002718 18 582 1 Healthy 70 other Visual 0.005 TB
ds003061 13 282 1 Not specified 64 10-20 Auditory 0.002 TB
ds003690 75 2630 1 Healthy 61 10-20 Auditory 0.023 TB
ds003805 1 10 1 Healthy 19 10-20 Multisensory 0 TB
ds003838 65 947 1 Healthy 63 10-20 Auditory 100.2 GB
ds004010 24 102 1 Healthy 64 other Multisensory 0.025 TB
ds004040 13 160 2 Healthy 64 10-20 Auditory 0.012 TB
ds004350 24 960 2 Healthy 64 other Visual 0.023 TB
ds004362 109 9162 1 Healthy 64 10-20 Visual 0.008 TB
ds004504 88 269 1 Dementia 19 10-20 Resting State 2.6 GB
ds004554 16 101 1 Healthy 99 10-20 Visual 0.009 TB
ds004635 48 292 1 Healthy 129 other Multisensory 26.1 GB
ds004657 24 838 6 Not specified 64 10-20 Motor 43.1 GB
ds004660 21 299 1 Healthy 32 10-20 Multisensory 7.2 GB
ds004661 17 90 1 Not specified 64 10-20 Multisensory 1.4 GB
ds004745 52 762 1 Healthy 64 ? Auditory 0 TB
ds004785 17 74 1 Healthy 32 ? Motor 0 TB
ds004841 20 1034 2 Not specified 64 10-20 Multisensory 7.3 GB
ds004842 14 719 2 Not specified 64 ? Multisensory 5.2 GB
ds004843 14 649 1 Not specified 64 ? Visual 7.7 GB
ds004844 17 481 4 Not specified 64 ? Multisensory 22.3 GB
ds004849 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds004850 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds004851 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds004852 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds004853 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds004854 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds004855 17 481 4 Not specified 64 ? Multisensory 0.077 GB
ds005034 25 406 2 Healthy 129 ? Visual 61.4 GB
ds005079 1 210 12 Healthy 64 ? Multisensory 1.7 GB
ds005342 32 134 1 Healthy 17 ? Visual 2 GB
ds005410 81 492 1 Healthy 63 ? ? 19.8 GB
ds005505 136 5393 1 Healthy 129 other Visual 103 GB
ds005506 150 5645 1 Healthy 129 other Visual 112 GB
ds005507 184 7273 1 Healthy 129 other Visual 140 GB
ds005508 324 13393 1 Healthy 129 other Visual 230 GB
ds005509 330 19980 1 Healthy 129 other Visual 224 GB
ds005510 135 4933 1 Healthy 129 other Visual 91 GB
ds005511 381 18604 1 Healthy 129 other Visual 245 GB
ds005512 257 9305 1 Healthy 129 other Visual 157 GB
ds005514 295 11565 1 Healthy 129 other Visual 185 GB
ds005672 3 18 1 Healthy 64 10-20 Visual 4.2 GB
ds005697 52 210 1 Healthy 64 10-20 Visual 67 GB
ds005787 30 ? 4 Healthy 64 10-20 Visual 185 GB

Data format

EEGDash queries return a Pytorch Dataset formatted to facilitate machine learning (ML) and deep learning (DL) applications. PyTorch Datasets are the best format for EEGDash queries because they provide an efficient, scalable, and flexible structure for machine learning (ML) and deep learning (DL) applications. They allow seamless integration with PyTorch’s DataLoader, enabling efficient batching, shuffling, and parallel data loading, which is essential for training deep learning models on large EEG datasets.

Data preprocessing

EEGDash datasets are processed using the popular BrainDecode library. In fact, EEGDash datasets are BrainDecode datasets, which are themselves PyTorch datasets. This means that any preprocessing possible on BrainDecode datasets is also possible on EEGDash datasets. Refer to BrainDecode tutorials for guidance on preprocessing EEG data.

EEG-Dash usage

Install

Use your preferred Python environment manager with Python > 3.9 to install the package.

  • To install the eegdash package, use the following temporary command (a direct pip install eegdash option will be available soon): pip install -i https://test.pypi.org/simple/eegdash
  • To verify the installation, start a Python session and type: from eegdash import EEGDash

Data access

To use the data from a single subject, enter:

from eegdash import EEGDashDataset
ds_NDARDB033FW5 = EEGDashDataset({'dataset': 'ds005514', 'task': 'RestingState', 'subject': 'NDARDB033FW5'}, description_fields=['sex'])

This will search and download the metadata for the task RestingState for subject NDARDB033FW5 in BIDS dataset ds005514. The actual data will not be downloaded at this stage. Following standard practice, data is only downloaded once it is processed. The ds_NDARDB033FW5 object is a fully functional BrainDecode dataset, which is itself a PyTorch dataset. This tutorial shows how to preprocess the EEG data, extracting portions of the data containing eyes-open and eyes-closed segments, then perform eyes-open vs. eyes-closed classification using a (shallow) deep-learning model.

To use the data from multiple subjects, enter:

from eegdash import EEGDashDataset
ds_ds005505rest = EEGDashDataset({'dataset': 'ds005505', 'task': 'RestingState'}, target_name='sex')

This will search and download the metadata for the task 'RestingState' for all subjects in BIDS dataset 'ds005505' (a total of 136). As above, the actual data will not be downloaded at this stage so this command is quick to execute. Also, the target class for each subject is assigned using the target_name parameter. This means that this object is ready to be directly fed to a deep learning model, although the tutorial script performs minimal processing on it, prior to training a deep-learning model. Because 14 gigabytes of data are downloaded, this tutorial takes about 10 minutes to execute.

Automatic caching

EEGDash automatically caches the downloaded data in the .eegdash_cache folder of the current directory from which the script is called. This means that if you run the tutorial scripts, the data will only be downloaded the first time the script is executed.

Education -- Coming soon...

We organize workshops and educational events to foster cross-cultural education and student training, offering both online and in-person opportunities in collaboration with US and Israeli partners. Events for 2025 will be announced via the EEGLABNEWS mailing list. Be sure to subscribe.

About EEG-DaSh

EEG-DaSh is a collaborative initiative between the United States and Israel, supported by the National Science Foundation (NSF). The partnership brings together experts from the Swartz Center for Computational Neuroscience (SCCN) at the University of California San Diego (UCSD) and Ben-Gurion University (BGU) in Israel.

Screenshot 2024-10-03 at 09 14 06

About

A repo documenting EEG-Dash data and its usage

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published