Journal

2016-11-24

Discussed how to load data for augmentation, and whether it makes sense to normalize the data in the spectral domain to zero mean, and identity variance. The former should probably be done on the fly from a compressed data set file on disk. Where each epoch is loaded at random, and same class + noise augmented in real time.

The next step is to write the relevant methods for loading, and choosing the random batches of data. And then augment them on the fly. The augmented files are then used as input to the CNN.

open channel to the compressed data set
load as many segments as possible (memory constraints), at random without replacement, from the compressed data set
randomly augment each segment with:
- three noise segments
- a same class signal segment
- and time/frequency shift it
connect the augmented segments to the network model

2016-11-23

Added mask scaling
Added benchmark file
Added preprocessing step
Fixed bottleneck from preprocessing step

The script pp.py can now preprocess a data set. The script is hardcoded to the mlsp2013 training data, but should generalize to any data set with 16-bit mono wave files with a sample rate of 16000Hz. The script assumes that a file called file2labels.csv is present in the data set directory, and will

read each wave file
mask out the noise and signal part of the wave file
split the noise and signal parts into equally sized segments
save the segments in the specified output directory
create a new file2labels.csv file in this directory with the labels for each signal segment

I have gotten first hand experience with for loop bottlen necks, and simply by removing the for loops in the compute_binary_mask method, and replace them with numpy array operation I got a speedup of around 30x in the wave file preprocessing.

Daily take away: Do not use iterative for loops

Do tomorrow:

Create a couple of preprocessing images and compare to Mario nips2013
Fix the scaling error in reshape_binary_mask
Start with data augmentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal

2016-11-24

2016-11-23

Clone this wiki locally