Add word_length_penalty option. #40

Setting length_penalty to a negative score is helpful for CTC models, since they are often biased towards taking shorter length paths through the WFST graph. (Since shorter paths have smaller costs, in general.) However, a side effect of using length penalty this way is that stuff like "no one cares" would come out as "no one caress" instead because "caress" has a longer WFST path than "cares". Applying the penalty only when olabel != 0 (epsilon) can help work around this issue, while still preserving some of the benefits from length_penalty. Note that this word_length_penalty is applied in both the emitting and non-emitting ExpandArcs, while length_penalty is applied only in the emitting ExpandArcs. I believe this is the proper way to do things. Here are some experiments from running the test test_sub_ins_del: model is stt_en_conformer_ctc_small dataset is test-clean For vanilla "ctc" topology, the best length_penalty was -5.0. The WER was: wer=0.04530584297017651, ins=369, sub=1650, del=363 For vanilla "ctc" topology, the best word_length_penalty was -10.0. The WER was: wer=0.045058581862446746, ins=375, sub=1608, del=386 For compact "ctc" topology, the best length_penalty was -9.5. The WER was: wer=0.045058581862446746, ins=375, sub=1608, del=386 For compact "ctc" topology, the best word_length_penalty was -10.0. The WER was: wer=0.04309951308581862, ins=302, sub=1572, del=392 The best result comes from using compact CTC topology with word_length_penalty=-10.0 It makes sense that a more negative length penalty is required to minimize WER for the compact CTC topology; it has fewer self-loops. Insertion, Substitution, and Deletion statistics were obtained by applying this diff: modified src/riva/asrlib/decoder/test_graph_construction.py @@ -963,6 +963,8 @@ class TestGraphConstruction: references = [s.lower() for s in references] # Might want to try a different WER implementation, for sanity. my_wer = wer(references, predictions) + wer_ratio, ins, sub, deletions = my_wer + print(f"GALVEZ:wer={wer_ratio}, ins={ins}, sub={sub}, del={deletions}") other_wer = word_error_rate(references, predictions) print("beam search WER:", my_wer) print("other beam search WER:", other_wer)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add word_length_penalty option. #40

Add word_length_penalty option. #40

Commits on Jun 12, 2024

Add word_length_penalty option. #40

Are you sure you want to change the base?

Add word_length_penalty option. #40

Commits on Jun 12, 2024