You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Train an Engish language model (Kneser-Ney smoothed 5-gram, with pruning), with KenLM toolkit, on cleaned text from the Common Crawl Repository. For detailed requirements please refer to DS2 paper.
Add the training script into the DS2 trainer script.
Add inference interfaces for this n-gram language model, insertable to CTC-LM-beam-search for decoding.
Keep in mind that the interfaces should be compatible with both English (word-based LM) and Madarin (character-based LM).
@cxwangyi@kuke@xinghai-sun
Hi, as mentioned in the paper, a language model has to be trained to improve the generating results and the LM is a critical component to ensure the performance. The language model is trained on texts crawled from commoncrawl.org using KenLM toolkit. However, we need more details to train such a language model. Any possible to get the trained language model or text dataset trained on?
The text was updated successfully, but these errors were encountered: