Hierarchical Token Semantic Audio Transformer

Introduction

The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection", in ICASSP 2022.

In this paper, we devise a model, HTS-AT, by combining a swin transformer with a token-semantic module and adapt it in to audio classification and sound event detection tasks. HTS-AT is an efficient and light-weight audio transformer with a hierarchical structure and has only 30 million parameters. It achieves new state-of-the-art (SOTA) results on AudioSet and ESC-50, and equals the SOTA on Speech Command V2. It also achieves better performance in event localization than the previous CNN-based models.

Installation

First install PyTorch for your system and CUDA version. e.g.

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Then, install other dependencies

pip install -r requirements.txt

Configuration

Set the Configuration File: config.py

The script config.py contains all configurations you need to assign to run your code. Please read the introduction comments in the file and change your settings.

If you want to train/test your model on ESC-50, you need to set:

dataset_path = "your processed ESC-50 folder"
dataset_type = "esc-50"
loss_type = "clip_ce"
sample_rate = 32000
hop_size = 320 
classes_num = 50

Model Checkpoints:

We provide the model checkpoints on three datasets (and additionally DESED dataset) in this link. Feel free to download and test it.

Citing

@inproceedings{htsat-ke2022,
  author = {Ke Chen and Xingjian Du and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection},
  booktitle = {{ICASSP} 2022}
}

Our work is based on Swin Transformer, which is a famous image classification transformer model.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
class_label_indice.csv		class_label_indice.csv
esc_config.py		esc_config.py
inference.py		inference.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical Token Semantic Audio Transformer

Introduction

Installation

Configuration

Set the Configuration File: config.py

Model Checkpoints:

Citing

About

Releases

Packages

Languages

License

elementx-ai/HTS-Audio-Transformer

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Token Semantic Audio Transformer

Introduction

Installation

Configuration

Set the Configuration File: config.py

Model Checkpoints:

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages