AudioNet is a simple convolutional neural net based on 1-D convolutions. This is trained and tested on google's speech command dataset.
Tested with following setup
- Python 3.5
- Numpy
- Scipy
- Keras 2.0.8
- Tensorflow 1.4.1
- Scikit-learn
- GTX 1050 TI 4 GB
Here, 1-D convolutions (linear convolutions) are used on top of regular hidden layers to classify the speech signal. The dataset used is Google's speech Commands Dataset
The network has five 1-D convolutional layers with kernel size 32 and stride of 4. They are followed by four hidden layers with 512 neurons each. The network has approximately 10 million parameters in total.
- Random noise
- Random shift
The dataset has to be in appropriate subfolders with each folder name being the class label. The script AudioNet32.py needs the following inputs to train
- data_path : root folder of dataset
- train_ratio : ratio of files to be used for training and remaining is for validation
- batch_size : minibatch size for training.
- num_epochs : total no. of epochs
- dst : destination folder to save weights, logs
The script will generate a pickle file that contains synset for validation, training and validation files path and labels. This can be used to resume training using resume_training() function.
The script will save weights once in every 2 epochs.
The synset used for training is available in train_data_dic.pkl file. The pretrained weights are available in the following link