This repository contains code to run a script that collects speech data from your microphone.
Watch video below to see how it works:
- Collect ASR Corpus with your computer in places without internet connection (it's important for low-resourced languages)
- Split speech to chunks by Voice Activity Detection mechanism
Install Python requirements:
# the author has successfully tested the project with wave=0.0.2, torch==1.11.0, torchaudio==0.11.0, sox==1.4.1, and pyaudio==0.2.11
pip install wave torch torchaudio pyaudio sox
brew install portaudio sox
pip install wave
pip install --global-option='build_ext' --global-option='-I/usr/local/include' --global-option='-L/usr/local/lib' pyaudio
To install torch and torchaudio on MacOS you need to install conda or miniconda (I recommend it) and then install torch libraries:
For Intel:
conda install pytorch torchaudio -c pytorch
For M1:
pip3 install torch torchaudio
If you have problems with installation of pyaudio, then check out this link. For me below command works:
pip3 install --global-option='build_ext' --global-option='-I/opt/homebrew/Cellar/portaudio/19.7.0/include/' --global-option='-L/opt/homebrew/Cellar/portaudio/19.7.0/lib/' pyaudio
# Create folders where audio files will appear
mkdir data
mkdir speech
# Run the loop (this script will record speech and save it into the speech/ folder)
# Use Ctrl-C to stop the script
python record_and_split.py
- If you have any issues - create an issue in the repository
- Currently tested on Linux and MacOS, for Windows you need to change the script slightly
- Silero VAD: https://github.com/snakers4/silero-vad
- PyAudio: https://people.csail.mit.edu/hubert/pyaudio/
- wave: https://pythonhosted.org/Wave/