Official Public Code for "SyllableLM: Learning Coarse Semantic Units for Speech Language Models"
Paper: https://arxiv.org/abs/2410.04029
In submission to ICLR 2025
conda create -n syllablelm python=3.9
conda activate syllablelm
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu115
pip install omegaconf
pip install timm
SylBoost | Model | KMeans | Agglomerative Clustering |
---|---|---|---|
8.33Hz | Model | KMeans | Agglom |
6.25Hz | Model | KMeans | Agglom |
5.0Hz | Model | KMeans | Agglom |
SylBoost inference and efficient extraction code in extract_units.py
People have had trouble setting up Data2Vec2 so I copied it and stripped it. No Fairseq reqired!
sylboost_reader = SylBoostFeatureReader(
'/path/to/model.pt'
'/path/to/kmeans.npy',
'/path/to/agglom.npy',
'8.33Hz', # '6.25Hz', '5.0Hz'
)
SyllableLM | Model |
---|---|
6.25Hz Base | Model |
6.25Hz Large | Model |
6.25Hz Interleaved Vocoder LM | Model |
Todo: migrate code over and facilitate twist dependency.
Todo
Todo
This will be provided as-is
This will be provided as-is
This is standard language model training and will be provided as is.