Skip to content

Multi-class classification of images and audio files using Deep Learning

Notifications You must be signed in to change notification settings

samyukthababu/music-genre-classification

Repository files navigation

Music Genre Classification

The notebooks show the code used to perform multi-class classification of images and audio files into 10 classes (music genres) for the dataset taken from Kaggle: GTZAN Dataset - Music Genre Classification.

Classification of images was implemented using Feed-Forward Neural Network (FNN) and Convolutional Neural Network (CNN) (3 different architectures) for 50 and 100 epochs.

The accuracy for the 4 networks for 50 epochs are shown in the table below:

Network Architectures Training Accuracy Validation Accuracy Test Accuracy
FNN + ReLU + Adam 84.55% 41.40% 45.54%
CNN + ReLU + Adam 100% 45.71% 46.53%
CNN + ReLU + Batch Normalisation + Adam 100% 65.03% 56.41%
CNN + ReLU + Batch Normalisation + RMSprop 33.08% 30.55% 31.25%

The accuracy for the 4 networks for 100 epochs are shown in the table below:

Network Architectures Training Accuracy Validation Accuracy Test Accuracy
FNN + ReLU + Adam 96.60% 40.83% 47.52%
CNN + ReLU + Adam 100% 48.02% 49.50%
CNN + ReLU + Batch Normalisation + Adam 100% 61.76% 56.41%
CNN + ReLU + Batch Normalisation + RMSprop 79.11% 39.06% 45.94%

Classification of audio was implemented using Long-Short Term Memory (LSTM). Data Augmentation was performed using Generative Adversarial Network (GAN), after which classification was performed. These two networks were compared and the accuracy for 50 epochs is shown in the table below:

Network Architectures Training Accuracy Validation Accuracy Test Accuracy
LSTM 51.65% 48.05% 52.85%
LSTM + Augmented data 35.58% 35.30% 37.61%