Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Hybrid RNNT-CTC model #5364

Merged
merged 76 commits into from
Dec 6, 2022
Merged

Conversation

VahidooX
Copy link
Collaborator

@VahidooX VahidooX commented Nov 9, 2022

What does this PR do ?

This PR is a refactored version of the following PR created by https://github.com/iankur:
#4854

It adds the Hybrid RNNT/CTC model which has two decoders of CTC and RNNT(Transducer) over the encoder.
It enables to train a single model instead of two which works with both CTC and RNNT decoding. It also speeds up the convergence for CTC models.

Added the following examples:
./examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_char.py
./examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py

Along with these sample configs for hybrid Conformer model:
./examples/asr/conf/conformer/hybrid_transducer_ctc/conformer_hybrid_transducer_ctc_char.yaml
./examples/asr/conf/conformer/hybrid_transducer_ctc/conformer_hybrid_transducer_ctc_bpe.yaml

Collection:
ASR

Changelog

  • Added support for Hybrid RNNT/CTC ASR model.

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
VahidooX and others added 8 commits November 8, 2022 18:50
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 9, 2022

This pull request introduces 16 alerts when merging fa92433 into 265056e - view on LGTM.com

new alerts:

  • 12 for Unused import
  • 4 for Unreachable code

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really awesome. Needs plenty of cleanup especially for the subword model but it should be doable.

VahidooX and others added 4 commits November 15, 2022 13:09
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 6 alerts when merging 2b3902a into 5665f14 - view on LGTM.com

new alerts:

  • 4 for Unreachable code
  • 2 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 9 alerts when merging bab1fc2 into 5665f14 - view on LGTM.com

new alerts:

  • 5 for Unused import
  • 4 for Unreachable code

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

Signed-off-by: Vahid <vnoroozi@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 9 alerts when merging a0d584e into 5665f14 - view on LGTM.com

new alerts:

  • 5 for Unused import
  • 4 for Unreachable code

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 16, 2022

This pull request introduces 8 alerts when merging 31d9f07 into 25f9ab9 - view on LGTM.com

new alerts:

  • 4 for Unused import
  • 4 for Unreachable code

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 30, 2022

This pull request introduces 1 alert when merging b5a56b4 into 5c1d59e - view on LGTM.com

new alerts:

  • 1 for Signature mismatch in overriding method

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

titu1994
titu1994 previously approved these changes Dec 2, 2022
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, @bmwshop for final review

Returns: None

"""
if isinstance(new_tokenizer_dir, DictConfig):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmwshop can you look into whether you can pull up this preconfig stuff into utility method of ASRBPEMixin in the future.

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks great, thanks ! We should add docstring to some functions, plus note it in README.md cause its a very cool functionality.

Minor comments


// stage('L2: Hybrid ASR RNNT-CTC dev run') {
// when {
// anyOf {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont we want to uncomment the test now?



class TestEncDecHybridRNNTCTCModel:
@pytest.mark.skipif(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is subword level test file?

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome !

@titu1994 titu1994 merged commit 786a850 into NVIDIA:main Dec 6, 2022
artbataev pushed a commit that referenced this pull request Dec 16, 2022
* added initial code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed name from joint to hybrid.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed format.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed format.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addec CI test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addec CI test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bugs in change_vocabs.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed bugs in change_vocabs.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated the streaming names.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added methods.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added decoding.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fxied the tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: vahidoox <vnoroozi@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
andrusenkoau pushed a commit to andrusenkoau/NeMo that referenced this pull request Jan 5, 2023
* added initial code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed name from joint to hybrid.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed format.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed format.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addec CI test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addec CI test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bugs in change_vocabs.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed bugs in change_vocabs.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated the streaming names.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added methods.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added decoding.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fxied the tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: vahidoox <vnoroozi@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
titu1994 pushed a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
* added initial code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed name from joint to hybrid.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed format.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed format.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addec CI test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addec CI test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bugs in change_vocabs.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed bugs in change_vocabs.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: vahidoox <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* raise error for aux_ctc.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated the streaming names.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added unittests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added methods.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added decoding.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fxied the tests.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: vahidoox <vnoroozi@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants