-
Notifications
You must be signed in to change notification settings - Fork 42
Journal
The pipeline now reads compressed (gzip) files instead of raw .wav files. This reduces the space on disk for the data set by a factor of three. An augmentation method called time_shift_signal has been added which will simply split the signal in two at random, and place the second part before the first part. Everything seems to be running well on the GPU, and the memory usage is low and well maintained.
After running a first experiment with 10 mini batches, each with 10 epochs of randomly chosen augmented data, the training loss seems to be slowly decreasing (from ~0.25 to ~0.20). This slower convergence is what we want to achieve in hope of model that will generalize well. However, the validation loss does not seem to be decreasing at all. This may be due to the very low number of mini-batches and epochs, and a longer training instance has been started to see if it will change the results. The training set consists of 5000 randomly chosen, and augmented signal segments, which each mini-batch drawn from at random.
Next implementation:
- proper metric, e.g., Area Under Curve, or Mean Average Precision.
Still not in the pipline:
- pitch shift augmentation
- median filtering in the mask calculations
Implemented a data generator scheme which should be easy to use, which can be configured to return a set of augmented samples in mini-batches. The mini-baches can then be used to fit the model, for a couple of epochs, in small steps (every mini-batch is fit into memory). It seems to be working, and is running ok on the GPU.
Next steps:
- compress data set and read from compressed files if needed
- add time augmentation
- add pitch shift augmentation
Implemented a couple of data augmentation generator methods. The agumented samples are now computed using only their filenames and stored as dicts. It should be possible to create a large data set of such unique samples, and then only load the files into memory for each mini-batch that is generated by the augmented data samples set.
- make sure that the mini-batch is removed from memory when it has been used
- compress the data set using gzip, and decompress on the fly in python
- connect the augmented data samples to the training scheme
Discussed how to load data for augmentation, and whether it makes sense to normalize the data in the spectral domain to zero mean, and identity variance. The former should probably be done on the fly from a compressed data set file on disk. Where each epoch is loaded at random, and same class + noise augmented in real time.
The next step is to write the relevant methods for loading, and choosing the random batches of data. And then augment them on the fly. The augmented files are then used as input to the CNN.
- open channel to the compressed data set
- load as many segments as possible (memory constraints), at random without replacement, from the compressed data set
- randomly augment each segment with:
- three noise segments
- a same class signal segment
- and time/frequency shift it
- connect the augmented segments to the network model
- Added mask scaling
- Added benchmark file
- Added preprocessing step
- Fixed bottleneck from preprocessing step
The script pp.py can now preprocess a data set. The script is hardcoded to the mlsp2013 training data, but should generalize to any data set with 16-bit mono wave files with a sample rate of 16000Hz. The script assumes that a file called file2labels.csv is present in the data set directory, and will
- read each wave file
- mask out the noise and signal part of the wave file
- split the noise and signal parts into equally sized segments
- save the segments in the specified output directory
- create a new file2labels.csv file in this directory with the labels for each signal segment
I have gotten first hand experience with for loop bottlen necks, and simply by removing the for loops in the compute_binary_mask method, and replace them with numpy array operation I got a speedup of around 30x in the wave file preprocessing.
Daily take away: Do not use iterative for loops
Do tomorrow:
- Create a couple of preprocessing images and compare to Mario nips2013
- Fix the scaling error in reshape_binary_mask
- Start with data augmentation