A Python package for creating proscript files. Proscript helps represent speech with segment-level prosodic features like f0, intensity and word alignment.
Proscript is developed and tested on a MacOS with Python 3+.
Proscript depends on the following libraries:
1- Install Praat and make sure it is accessible from command line as praat
.
2- Install proscript
git clone https://github.com/alpoktem/proscript.git
cd proscript
pip install .
3- Install Montreal Forced aligner and set the following environment variables for the scripts to locate binaries and models of Montreal Forced Aligner
export MFA_ALIGN_BINARY=montreal-forced-aligner-path/bin/mfa_align
export MFA_LEXICON=montreal-forced-aligner-path/pretrained_models/en.dict
export MFA_LM=montreal-forced-aligner-path/pretrained_models/english.zip
Creates proscript from a short audio of max 30s with known transcript specified in a text file. Transcription should only contain word tokens (no punctuation etc.)
proscripter --short -a audio.wav -t transcript.txt -o output_dir
Creates proscript from an audio with a segmented transcript specified in a TextGrid file.
proscripter --long -a audio.wav -t audio.TextGrid -o output_dir
Does automatic speech recognition and then creates a proscript of it.
Set environment variable for the Vosk model you want to use by:
export VOSK_MODEL=vosk-model-path
proscripter --recognize -a audio.wav -o output_dir
Refer to proscript/scripts.py
to see examples of creating proscript files.
Reading a proscript file in python:
from proscript import Proscript
p = Proscript()
p.from_file(csv_filename='my_proscript.csv', proscript_id="my_proscript", audio_file="my_audio.wav", delimiter="|")
- Movie2parallelDB makes sentence aligned proscripts from subtitled and dubbed movies.
- Prosograph lets visualize proscripts.
- PunkProse generates punctuation for speech transcripts using lexical, syntactic and prosodic features stored in proscript files.
- PANTED corpus - 250 hour speech corpus from TED talks
- Heroes corpus - Parallel English-Spanish speech corpus of dubbed movie segments
If you use this library, please cite the following paper:
Öktem, A., Farrús, M. & Bonafonte, A.
Corpora compilation for prosody-informed speech processing.
Lang Resources & Evaluation 55, 925–946 (2021).
https://doi.org/10.1007/s10579-021-09556-2