-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding cache-aware streaming Conformer with look-ahead support #3888
Adding cache-aware streaming Conformer with look-ahead support #3888
Conversation
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
This pull request introduces 7 alerts and fixes 4 when merging c2cfe4e into 8e1436b - view on LGTM.com new alerts:
fixed alerts:
|
Signed-off-by: Vahid <vnoroozi@nvidia.com>
This pull request introduces 7 alerts and fixes 4 when merging 7589f88 into aaeac3c - view on LGTM.com new alerts:
fixed alerts:
|
…former_lookahead_newdesign
This pull request introduces 7 alerts and fixes 4 when merging 0bde720 into 5c8fe3a - view on LGTM.com new alerts:
fixed alerts:
|
Signed-off-by: Vahid <vnoroozi@nvidia.com>
This pull request introduces 7 alerts and fixes 4 when merging 090f838 into 5c8fe3a - view on LGTM.com new alerts:
fixed alerts:
|
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
This pull request introduces 9 alerts and fixes 4 when merging 463aed6 into 5c8fe3a - view on LGTM.com new alerts:
fixed alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for now since we're out of times. But before merge, rename the function to cache_aware_stream_step
- basic stream_step
is too generic and does not inform what is being used, and is not future proof.
start_time = time.time() | ||
for sample_idx, sample in enumerate(samples): | ||
processed_signal, processed_signal_length, stream_id = streaming_buffer.append_audio_file( | ||
sample['audio_filepath'], stream_id=-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document this script a lot more in the branch cut for 1.11. For now its fine
if (sample_idx + 1) % args.batch_size == 0 or sample_idx == len(samples) - 1: | ||
logging.info(f"Starting to stream samples {sample_idx - len(streaming_buffer) + 1} to {sample_idx}...") | ||
streaming_tran, offline_tran = perform_streaming( | ||
asr_model=asr_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ comment
if hasattr(self.input_module, 'forward_for_export'): | ||
encoder_output = self.input_module.forward_for_export(input, length) | ||
if cache_last_channel is None and cache_last_time is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok leaving this comment unresolved for later check then.
Signed-off-by: Vahid <vnoroozi@nvidia.com>
…former_lookahead_newdesign
This pull request introduces 9 alerts and fixes 4 when merging 194581f into 498ff20 - view on LGTM.com new alerts:
fixed alerts:
|
Hi @VahidooX , thanks for the examples you made. I tried with 2 minutes wav file using
but I get this error during online mode:
do you have any idea what might have happened? Thanks! |
This approach need you to train a model in streaming mode to get the best results which means with limited right and left context and no normalization in feature extraction. While it can be possible to try offline models with this approach, the accuracy would not be great. I have not added the support of offline models in this PR, I would look into it and add it soon. |
@VahidooX are there perhaps any pre-trained streaming models already available? |
Not yet, I am still working on training them on nemo asrset. Hopefully there will be some uploaded on NGC by the end of this month. |
Hi @VahidooX , looking forward to the support of offline models, thank you very much! |
Here is the draft PR to add support for models trained with full context to be used with cache-aware streaming in chunk-aware look-ahead style: Just note that the results would be significantly worse than when you train the model in streaming mode. I will share some numbers in the PR when they are ready. The main advantage of using this approach on an offline model comparing to the buffered streaming is just using less computations. Cache-aware approach is unlikely to give better results in terms of accuracy for such models as they don't use overlapping chunks in chunk-aware mode. I would try to add the support for regular look-ahead which uses overlapping chunks. |
Thanks for the PR @VahidooX , let me study your code. |
…A#3888) Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
…A#3888) Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
@VahidooX any update on pre-trained models. Not able to get the models converge without initializing the weights. |
@VahidooX Did you manage to train these models on Nemo ASRSET? If yes, can you send files? |
…A#3888) Signed-off-by: Hainan Xu <hainanx@nvidia.com>
What does this PR do ?
Adding cache-aware streaming Conformer training and inference with look-ahead support. It is achieved by training a model with limited effective right context and then perform the streaming with activation caching support. Limiting the right context would reduce the accuracy in compare to the an offline model but it gives better accuracy and significantly higher throughput by dropping duplicates in the computations which happens in buffered-based streaming.Large right context decreases the WER while increasing the latency.
It supports the three following modes:
1-fully causal model with zero look-ahead with zero latency
2-regular look-ahead
3-chunk-aware look-ahead with small duplication in computations.
It supports both Conformer-CTC and Conformer-Transducer and they can get trained with regular scripts but the configs files in the following folder:
NeMo/examples/asr/conf/conformer/streaming/
A model trained in streaming mode can get evaluated with the following script:
NeMo/examples/asr/conf/conformer/streaming/speech_to_text_streaming_infer.py
This script would simulate the streaming inference for a single audio or a manifest of audio files. Streaming can be done in multi-streaming mode (batched inference) for the manifest file to speed up the streaming. It can also compare the results with offline evaluation and report the differences in both the WER and models' outputs.
The accuracy of the model in both the offline evaluation and streaming is going to be exactly the same. In offline mode, the whole audio is passed through the model while in streaming audio is passed chunk by chunk.
Changelog
Usage
# Add a code snippet demonstrating how to use this
PR Type: