Skip to content

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Notifications You must be signed in to change notification settings

AlanBaade/SyllableLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

SyllableLM

Official Public Code for "SyllableLM: Learning Coarse Semantic Units for Speech Language Models"

Paper: https://arxiv.org/abs/2410.04029

In submission to ICLR 2025

Setup:

conda create -n syllablelm python=3.9
conda activate syllablelm

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu115

pip install omegaconf
pip install timm

SylBoost:

Checkpoints

SylBoost Model KMeans Agglomerative Clustering
8.33Hz Model KMeans Agglom
6.25Hz Model KMeans Agglom
5.0Hz Model KMeans Agglom

Usage

SylBoost inference and efficient extraction code in extract_units.py

People have had trouble setting up Data2Vec2 so I copied it and stripped it. No Fairseq reqired!

sylboost_reader = SylBoostFeatureReader(
        '/path/to/model.pt'
        '/path/to/kmeans.npy',
        '/path/to/agglom.npy',
        '8.33Hz',  # '6.25Hz', '5.0Hz'
    )

SyllableLM:

Checkpoints

SyllableLM Model
6.25Hz Base Model
6.25Hz Large Model
6.25Hz Interleaved Vocoder LM Model

Usage

Todo: migrate code over and facilitate twist dependency.

Resynthesis:

Todo

Continuation Pipeline:

Todo

LossPred:

This will be provided as-is

SylBoost training:

This will be provided as-is

SyllableLM training:

This is standard language model training and will be provided as is.

About

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages