convolutional-audio-transformer

Compact Convolutional Transformers for Environmental Sound Classification

This project was done for the course "Human Data Analytics" at the University of Padova (winter semester 2023/24). The original code was written using Tensorflow/Keras (requirement of the course), this code is a rewrite using Pytorch instead of Tensorflow.

A Compact Convolutional Transformer (CCT) is trained on the ESC-50 dataset. As audio representations, mel-spectrograms are used. Additionally various augmentation techniques are applied to prevent overfitting as the ESC-50 dataset is relatively small (1600 audio samples for the train folds, 5 seconds each). The transformer encoder uses pre-layernorm and learnable positional embeddings.

Augmentation Techniques used:

Time shifting (Raw Audio)
Background Noise (Raw Audio)
Mixup (Raw Audio)
Frequency Masking (Mel-spectrograms)
Time Masking (Mel-spectrograms)

It acchives a average accuracy of about 82% using the default parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
augmentations.py		augmentations.py
dataset.py		dataset.py
main.py		main.py
model.py		model.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

convolutional-audio-transformer

Compact Convolutional Transformers for Environmental Sound Classification

About

Releases

Packages

Languages

License

micaebe/convolutional-audio-transformer

Folders and files

Latest commit

History

Repository files navigation

convolutional-audio-transformer

Compact Convolutional Transformers for Environmental Sound Classification

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages