emg2qwerty

[ Paper ] [ Dataset ] [ BibTeX ]

A dataset of surface electromyography (sEMG) recordings while touch typing on a QWERTY keyboard with ground-truth, benchmarks and baselines.

Setup

# Install [git-lfs](https://git-lfs.github.com/) (for pretrained checkpoints)
git lfs install

# Clone the repo, setup environment, and install local package
git clone git@github.com:facebookresearch/emg2qwerty.git ~/emg2qwerty
cd ~/emg2qwerty
conda env create -f environment.yml
conda activate emg2qwerty
pip install -e .

# Download the dataset, extract, and symlink to ~/emg2qwerty/data
cd ~ && wget https://fb-ctrl-oss.s3.amazonaws.com/emg2qwerty/emg2qwerty-data-2021-08.tar.gz
tar -xvzf emg2qwerty-data-2021-08.tar.gz
ln -s ~/emg2qwerty-data-2021-08 ~/emg2qwerty/data

Data

The dataset consists of 1,136 files in total - 1,135 session files spanning 108 users and 346 hours of recording, and one metadata.csv file. Each session file is in a simple HDF5 format and includes the left and right sEMG signal data, prompted text, keylogger ground-truth, and their corresponding timestamps. emg2qwerty.data.EMGSessionData offers a programmatic read-only interface into the HDF5 session files.

To load the metadata.csv file and print dataset statistics,

python scripts/print_dataset_stats.py

To re-generate data splits,

python scripts/generate_splits.py

The following figure visualizes the dataset splits for training, validation and testing of generic and personalized user models. Refer to the paper for details of the benchmark setup and data splits.

To re-format data in EEG BIDS format,

python scripts/convert_to_bids.py

Training

Generic user model:

python -m emg2qwerty.train \
  user=generic \
  trainer.accelerator=gpu trainer.devices=8 \
  --multirun

Personalized user models:

python -m emg2qwerty.train \
  user="glob(user*)" \
  trainer.accelerator=gpu trainer.devices=1 \
  checkpoint="${HOME}/emg2qwerty/models/generic.ckpt" \
  --multirun

If you are using a Slurm cluster, include "cluster=slurm" override in the argument list of above commands to pick up config/cluster/slurm.yaml. This overrides the Hydra Launcher to use Submitit plugin. Refer to Hydra documentation for the list of available launcher plugins if you are not using a Slurm cluster.

Testing

Greedy decoding:

python -m emg2qwerty.train \
  user="glob(user*)" \
  checkpoint="${HOME}/emg2qwerty/models/personalized-finetuned/\${user}.ckpt" \
  train=False trainer.accelerator=cpu \
  decoder=ctc_greedy \
  hydra.launcher.mem_gb=64 \
  --multirun

Beam-search decoding with 6-gram character-level language model:

python -m emg2qwerty.train \
  user="glob(user*)" \
  checkpoint="${HOME}/emg2qwerty/models/personalized-finetuned/\${user}.ckpt" \
  train=False trainer.accelerator=cpu \
  decoder=ctc_beam \
  hydra.launcher.mem_gb=64 \
  --multirun

The 6-gram character-level language model, used by the first-pass beam-search decoder above, is generated from WikiText-103 raw dataset, and built using KenLM. The LM is available under models/lm/, both in the binary format, and the human-readable ARPA format. These can be regenerated as follows:

Build kenlm from source: https://github.com/kpu/kenlm#compiling
Run ./scripts/lm/build_char_lm.sh <ngram_order>

License

emg2qwerty is CC-BY-NC-4.0 licensed, as found in the LICENSE file.

Citing emg2qwerty

@misc{sivakumar2024emg2qwertylargedatasetbaselines,
      title={emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography},
      author={Viswanath Sivakumar and Jeffrey Seely and Alan Du and Sean R Bittner and Adam Berenzweig and Anuoluwapo Bolarinwa and Alexandre Gramfort and Michael I Mandel},
      year={2024},
      eprint={2410.20081},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.20081},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
config		config
emg2qwerty		emg2qwerty
models		models
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

emg2qwerty

Setup

Data

Training

Testing

License

Citing emg2qwerty

About

Releases

Packages

Languages

License

wenjiaa/emg2qwerty

Folders and files

Latest commit

History

Repository files navigation

emg2qwerty

Setup

Data

Training

Testing

License

Citing emg2qwerty

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages