Skip to content

Why speach data for punctuation and capitalization task? #2868

Answered by ekmb
PharesAbanmy asked this question in Q&A
Discussion options

You must be logged in to vote

We used only text data from the speech datasets listed, no audio. The datasets were chosen to add variety to the training data. Gutenberg books contain many long complex sentences (this improves performance on commas), while the Fisher corpus contains conversational data with many question marks. To evaluation the model, please use this script.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by okuchaiev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants