Why speach data for punctuation and capitalization task? #2868
-
As mentioned here that speach data has been used for the punctuation and capitalization model. Do you use the speach data on training or testing? If training, how? and why? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@ekmb please take a look at this question. |
Beta Was this translation helpful? Give feedback.
-
We used only text data from the speech datasets listed, no audio. The datasets were chosen to add variety to the training data. Gutenberg books contain many long complex sentences (this improves performance on commas), while the Fisher corpus contains conversational data with many question marks. To evaluation the model, please use this script. |
Beta Was this translation helpful? Give feedback.
We used only text data from the speech datasets listed, no audio. The datasets were chosen to add variety to the training data. Gutenberg books contain many long complex sentences (this improves performance on commas), while the Fisher corpus contains conversational data with many question marks. To evaluation the model, please use this script.