Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue generating dataset from many short audio files #7

Open
McMaNGOS opened this issue Jun 5, 2017 · 1 comment
Open

Issue generating dataset from many short audio files #7

McMaNGOS opened this issue Jun 5, 2017 · 1 comment

Comments

@McMaNGOS
Copy link

McMaNGOS commented Jun 5, 2017

First of all, excellent work on the port!

I noticed a problem today when I attempted to generate a new dataset. The given data was 530 short .ogg files (each one around 1-3 seconds in length). When attempting to process this data with generate_dataset.lua, the resulting data folder would be empty, although no error would be shown in the terminal to indicate that something had gone awry.

I combined my data into one long audio file instead, and with this I could generate my dataset with no issues at all. My hunch is that if the files that make up the dataset are all shorter in length than the -seg_len value, this causes problems.

Reproducing the issue should be simple; create a few short audio files and attempt to generate a dataset using them.

@richardassar
Copy link
Owner

richardassar commented Jun 6, 2017

Hi, thanks for the feedback. I appreciate it.

Yeah, I should put a note somewhere explaining that anything less than seg_len is truncated. I opted for truncation rather than silence padding as it's more efficient to fill your minibatch to full capacity, not to mention the negative impact additional silence may have on the resulting model.

An alternative to silence padding is to zero-mask the gradients when elements of the minibatch are shorter than seg_len, this wastes computational resources but doesn't update the model during the padded regions.

If your shortest segment is 1 second then you could set seg_len to this value. Your workaround seems sensible though and perhaps just bundling a script to concatenate a folder of audio is a good solution.

I'll leave this open and come back to it later as it isn't critical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants