Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for improving dev-set performance. #4

Open
Feynman27 opened this issue Nov 2, 2016 · 2 comments
Open

Suggestions for improving dev-set performance. #4

Feynman27 opened this issue Nov 2, 2016 · 2 comments

Comments

@Feynman27
Copy link

(I apologize if this question is better suited for StackOverflow, but I figure posting it here will reach the right audience in a shorter amount of time.)

I'm training this CTC-cost model on the Librispeech "train-other-500" dataset, which contains 500 hours of speech audio+transcripts. I'm using the "dev-other" data set for development, which is apparently a more challenging audio set to model.

I trained the model over 20 epochs and have provided the distribution of the costs below.
image

The weights are updated according to Nesterov momentum.

Since the validation performance plateaus at around iter=25000, I decided to checkpoint the model here and continue running the model using an exponential learning-rate decay schedule. The learning rate is decreased after each epoch (starting from iter=25000). The CTC costs using this learning-rate decay schedule are shown below after a few epochs:

image

Unfortunately, this strategy doesn't appear to improve the model performance. Does anyone have any suggestions on how to improve the model other than what I've described above?

@srvinay
Copy link
Collaborator

srvinay commented Nov 2, 2016

From the looks of it, your model seems to have high variance. You should try reducing the initial learning rate, add in regularization (dropout/augment with noise) or play with the model architecture if these ideas don't work.

@dylanbfox
Copy link

dylanbfox commented Dec 7, 2016

You can also try increasing the data you're training on. By default the max wav length is set to 10 seconds (https://github.com/baidu-research/ba-dls-deepspeech/blob/master/data_generator.py#L53-L54) which excludes a good portion of the data in the LibriSpeech corpora. Longer utterances most likely will require more memory usage though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants