Skip to content

Audio classification using Compact Convolutional Transformers on ESC-50 Dataset

License

Notifications You must be signed in to change notification settings

micaebe/convolutional-audio-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

convolutional-audio-transformer

Compact Convolutional Transformers for Environmental Sound Classification

This project was done for the course "Human Data Analytics" at the University of Padova (winter semester 2023/24). The original code was written using Tensorflow/Keras (requirement of the course), this code is a rewrite using Pytorch instead of Tensorflow.

A Compact Convolutional Transformer (CCT) is trained on the ESC-50 dataset. As audio representations, mel-spectrograms are used. Additionally various augmentation techniques are applied to prevent overfitting as the ESC-50 dataset is relatively small (1600 audio samples for the train folds, 5 seconds each). The transformer encoder uses pre-layernorm and learnable positional embeddings.

Augmentation Techniques used:

  • Time shifting (Raw Audio)
  • Background Noise (Raw Audio)
  • Mixup (Raw Audio)
  • Frequency Masking (Mel-spectrograms)
  • Time Masking (Mel-spectrograms)

It acchives a average accuracy of about 82% using the default parameters.

About

Audio classification using Compact Convolutional Transformers on ESC-50 Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages