Canary training with punctuated/capitalized ASR transcripts is normalizing transcripts to lower case and no special characters #9398
Answered
by
MedAymenF
DeveshS1209
asked this question in
Q&A
-
Beta Was this translation helpful? Give feedback.
Answered by
MedAymenF
Jun 8, 2024
Replies: 1 comment
-
Which command have you used to train the tokenizer? |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
DeveshS1209
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Which command have you used to train the tokenizer?
"${NEMO_ROOT}/scripts/tokenizers/process_asr_text_tokenizer.py" applies nmt_nfkc_cf normalization by default.