How to use a pre-trained model for cache-aware FastConformer-Hybrid model? #9990
Replies: 3 comments
-
You can use this model, which is a chunk aware model - https://huggingface.co/nvidia/stt_en_fastconformer_hybrid_large_streaming_multi |
Beta Was this translation helpful? Give feedback.
-
Thank You @titu1994. I will try it. |
Beta Was this translation helpful? Give feedback.
-
It's a practical limitation. You can either get a ordinary fast conformer in German or a chunk aware Conformer in English. Depends on what your priority is - streaming or transcript accuracy. We have tutorial showing language transfer |
Beta Was this translation helpful? Give feedback.
-
Hi @titu1994,
Following our discussion in this thread, I’m training a cache-aware FastConformer hybrid CTC-RNNT model for German using 1.2K hours of audio data. Despite training for 150 epochs, my validation WER is still around 0.28.
I suspect the dataset quality might be an issue. I reviewed the paper "Stateful Conformer with Cache-Based Inference for Streaming ASR" and noted the significant performance achieved even with training from scratch on LibriSpeech.
Since you recommended using a pre-trained model, I tried using this model from Hugging Face, but it's not a streaming model. Is it still viable as a pre-trained model for my use case, or are there other German models available that you would recommend?
Thank you for your guidance!
Beta Was this translation helpful? Give feedback.
All reactions