-
Notifications
You must be signed in to change notification settings - Fork 42
Preprocessing
The methods used for preprocessing in this thesis follows Mario Lasseck, and Elias Sprengel.
The spectrogram is computed using a Han window of size 512, and an overlap of 75%.
Amplitude Spectrogram | Log Amplitude Spectrogram |
---|---|
Original Paper |
---|
The log amplitude spectrogram and the spectrogram from the original paper are very similar. However, there seems to be some discrepancy in the higher frequencies (observe that the authors spectrogram is reversed). Part of the signal structure seems to be lost. This is because I have resampled my signals to 16000Hz, and the authors have probably resampled their signals to 22050Hz. At least that is my best guess (it is not explicitly stated in the paper), since this is the sample frequency used in a paper they reference. And I have tested this empirically which yields even more similar spectrograms.
Noise and Signal Extraction |
---|
The noise and signal parts of a spectrogram are extracted by computing a binary image representation of the spectrogram. The the binary image will have value zero at cell (i, j) if the value of the spectrogram at cell (i, j) is threshold times larger than the i:th row median and the j:th column median of the spectrogam. We then continue to process the binary image with a couple of image filtering techniques.
The threshold is set to 3 for signal extraction, and 2.5 for noise extraction. The difference with the noise extraction is that the resulting mask is inverted at the end. This means that there could be parts (2.5-3) which are not marked as either noise or signal, this is to remove parts of the signal that is considered to have no useful information to the network. The noise mask is inverted.
The binary image is further processed by:
- erosion, and dilation (4 by 4 kernels)
- mask by checking if each column contain at least one cell marked as one
- scale mask to signal length
The mask is used to extract the signal/noise.