-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update ctc_beam_search_decoder design doc #2423
Changes from 1 commit
c8495f8
91d7838
7c53d72
535f6bd
de1a701
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -140,7 +140,17 @@ TODO by Assignees | |
|
||
### Beam Search with CTC and LM | ||
|
||
TODO by Assignees | ||
<div align="center"> | ||
<img src="image/beam_search.png" width=400><br/> | ||
Figure 2. Algorithm for Beam Search Decoder. | ||
</div> | ||
|
||
- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] with a modification for the ambiguous part, as shown in Figure 2. | ||
- An **external defined scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a space character appended. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove "defined" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
- Such scorer is a unified class, may consisting of language model, word count or any customed evaluators. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done and modified. |
||
- The **language model** is built from Task 5, with a parameter should be carefully tuned to achieve minimum WER/CER (c.f. Task 7) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a parameters --> parameters There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
- This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality. | ||
|
||
|
||
## Future Work | ||
|
||
|
@@ -153,3 +163,4 @@ TODO by Assignees | |
|
||
1. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](http://proceedings.mlr.press/v48/amodei16.pdf). ICML 2016. | ||
2. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595). arXiv:1512.02595. | ||
3. Awni Y. Hannun, etc. [First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs](https://arxiv.org/abs/1408.2873). arXiv:1408.2873 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the ambiguous part and what is the modification? Could you please add more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done