Skip to content

Canary training with punctuated/capitalized ASR transcripts is normalizing transcripts to lower case and no special characters #9398

Answered by MedAymenF
DeveshS1209 asked this question in Q&A
Discussion options

You must be logged in to vote

Which command have you used to train the tokenizer?
"${NEMO_ROOT}/scripts/tokenizers/process_asr_text_tokenizer.py" applies nmt_nfkc_cf normalization by default.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by DeveshS1209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants