AudioNet-V1

AudioNet is a simple convolutional neural net based on 1-D convolutions. This is trained and tested on google's speech command dataset.

Requirements

Tested with following setup

Software

Python 3.5
Numpy
Scipy
Keras 2.0.8
Tensorflow 1.4.1
Scikit-learn

Hardware

GTX 1050 TI 4 GB

One Dimensional CNN

Here, 1-D convolutions (linear convolutions) are used on top of regular hidden layers to classify the speech signal. The dataset used is Google's speech Commands Dataset

The network has five 1-D convolutional layers with kernel size 32 and stride of 4. They are followed by four hidden layers with 512 neurons each. The network has approximately 10 million parameters in total.

Data Augmentation used

Random noise
Random shift

Training Loss vs Epochs

Training Acuracy vs Epochs

Training

The dataset has to be in appropriate subfolders with each folder name being the class label. The script AudioNet32.py needs the following inputs to train

data_path : root folder of dataset
train_ratio : ratio of files to be used for training and remaining is for validation
batch_size : minibatch size for training.
num_epochs : total no. of epochs
dst : destination folder to save weights, logs

The script will generate a pickle file that contains synset for validation, training and validation files path and labels. This can be used to resume training using resume_training() function.

The script will save weights once in every 2 epochs.

Validation

The synset used for training is available in train_data_dic.pkl file. The pretrained weights are available in the following link

Download pretrained weights (Epoch 10)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AudioNet-V1

Requirements

Software

Hardware

One Dimensional CNN

Data Augmentation used

Training Loss vs Epochs

Training Acuracy vs Epochs

Training

Validation

Files

README.md

Latest commit

History

README.md

File metadata and controls

AudioNet-V1

Requirements

Software

Hardware

One Dimensional CNN

Data Augmentation used

Training Loss vs Epochs

Training Acuracy vs Epochs

Training

Validation