Why speach data for punctuation and capitalization task? #2868

PharesAbanmy · 2021-09-22T10:36:33Z

PharesAbanmy
Sep 22, 2021

As mentioned here that speach data has been used for the punctuation and capitalization model.

Do you use the speach data on training or testing?

If training, how? and why?
If testing, how can you evaluate it?

Thanks

Sep 29, 2021

We used only text data from the speech datasets listed, no audio. The datasets were chosen to add variety to the training data. Gutenberg books contain many long complex sentences (this improves performance on commas), while the Fisher corpus contains conversational data with many question marks. To evaluation the model, please use this script.

View full answer

VahidooX · 2021-09-28T23:18:50Z

VahidooX
Sep 28, 2021
Collaborator

@ekmb please take a look at this question.

0 replies

ekmb · 2021-09-29T22:25:49Z

ekmb
Sep 29, 2021
Collaborator

We used only text data from the speech datasets listed, no audio. The datasets were chosen to add variety to the training data. Gutenberg books contain many long complex sentences (this improves performance on commas), while the Fisher corpus contains conversational data with many question marks. To evaluation the model, please use this script.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why speach data for punctuation and capitalization task? #2868

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Why speach data for punctuation and capitalization task? #2868

PharesAbanmy Sep 22, 2021

Replies: 2 comments

VahidooX Sep 28, 2021 Collaborator

ekmb Sep 29, 2021 Collaborator

PharesAbanmy
Sep 22, 2021

VahidooX
Sep 28, 2021
Collaborator

ekmb
Sep 29, 2021
Collaborator