Official Public Code for "SyllableLM: Learning Coarse Semantic Units for Speech Language Models"
In submission to ICLR 2025
conda create -n syllablelm python=3.9
conda activate syllablelm
pip install torch torchvision torchaudio --index-url
pip install omegaconf
pip install timm
SylBoost | Model | KMeans | Agglomerative Clustering |
8.33Hz | Model | KMeans | Agglom |
6.25Hz | Model | KMeans | Agglom |
5.0Hz | Model | KMeans | Agglom |
SylBoost inference and efficient extraction code in
People have had trouble setting up Data2Vec2 so I copied it and stripped it. No Fairseq reqired!
sylboost_reader = SylBoostFeatureReader(
'8.33Hz', # '6.25Hz', '5.0Hz'
SyllableLM | Model |
6.25Hz Base | Model |
6.25Hz Large | Model |
6.25Hz Interleaved Vocoder LM | Model |
Todo: migrate code over and facilitate twist dependency.
This will be provided as-is
This will be provided as-is
This is standard language model training and will be provided as is.