Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone diarization+ASR evaluation script #5439

Merged
merged 50 commits into from
Nov 21, 2022
Merged

Commits on Nov 16, 2022

  1. first commit on eval_diar_with_asr.py

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 16, 2022
    Configuration menu
    Copy the full SHA
    19cbe6f View commit details
    Browse the repository at this point in the history
  2. Add a standalone diarization-ASR evaluation transcript

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 16, 2022
    Configuration menu
    Copy the full SHA
    d85efa2 View commit details
    Browse the repository at this point in the history
  3. Fixed examples in docstrings

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 16, 2022
    Configuration menu
    Copy the full SHA
    c1e7cf4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    aea2e6c View commit details
    Browse the repository at this point in the history
  5. Fixed staticmethod error

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 16, 2022
    Configuration menu
    Copy the full SHA
    c48bf07 View commit details
    Browse the repository at this point in the history
  6. merged main

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 16, 2022
    Configuration menu
    Copy the full SHA
    cdc3235 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    b9feb6b View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2022

  1. Configuration menu
    Copy the full SHA
    af94a2e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    902c088 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    11a7188 View commit details
    Browse the repository at this point in the history
  4. Added description on eval modes

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    979ec60 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'mulspk_asr_eval_script' of https://github.com/NVIDIA/NeMo

    … into mulspk_asr_eval_script
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    2731413 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9a48743 View commit details
    Browse the repository at this point in the history
  7. adding diar_infer_general.yaml

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    e449287 View commit details
    Browse the repository at this point in the history
  8. Merge branch 'mulspk_asr_eval_script' of https://github.com/NVIDIA/NeMo

    … into mulspk_asr_eval_script
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    a4cbd7a View commit details
    Browse the repository at this point in the history
  9. fix msdd_model in general yaml file

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    e35629d View commit details
    Browse the repository at this point in the history
  10. fixed errors in yaml file

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    db31725 View commit details
    Browse the repository at this point in the history
  11. combine into 1 commit

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    50f8417 View commit details
    Browse the repository at this point in the history
  12. Added description on eval modes

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    61b4809 View commit details
    Browse the repository at this point in the history
  13. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    pre-commit-ci[bot] authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    250468c View commit details
    Browse the repository at this point in the history
  14. Add MoE support for T5 model (w/o expert parallel) (#5409)

    * clean
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * kwarg ref
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * fix
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * fix
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * extra args
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * rm prints
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * style
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * review comments
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * review comments
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * review comments
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * fix
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    2 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    9e2acca View commit details
    Browse the repository at this point in the history
  15. Fix args (#5410) (#5416)

    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    2 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    0abcbaf View commit details
    Browse the repository at this point in the history
  16. Fix for concat map dataset (#5133)

    * change for concat map dataset
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Exhaust longest dataset
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Co-authored-by: 1-800-BAD-CODE <>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    4 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    54a87d2 View commit details
    Browse the repository at this point in the history
  17. Add temporary fix for CUDA issue in Dockerfile (#5421) (#5422)

    Signed-off-by: Yu Yao <yuya@nvidia.com>
    
    Signed-off-by: Yu Yao <yuya@nvidia.com>
    
    Signed-off-by: Yu Yao <yuya@nvidia.com>
    Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
    2 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    dd3f3aa View commit details
    Browse the repository at this point in the history
  18. Fix GPT generation when using sentencepiece tokenizer (#5413) (#5428)

    * Fix
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Fix
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Yi Dong <yidong@nvidia.com>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Yi Dong <yidong@nvidia.com>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    4 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    a2145ba View commit details
    Browse the repository at this point in the history
  19. Support for finetuning and finetuning inference with .ckpt files & ba…

    …tch size refactoring (#5339)
    
    * Initial refactor
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Resolve config before passing to load_from_checkpoint
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Fixes for model parallel and nemo restore
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Fixes for eval
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Revert config changes
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Refactor
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Fix typo
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Remove comments
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Minor
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Fix validation reconfiguration
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Remove old comment
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Fixes for test_ds
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    2 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    4d7b8cf View commit details
    Browse the repository at this point in the history
  20. Revert "Add temporary fix for CUDA issue in Dockerfile (#5421)" (#5431)…

    … (#5432)
    
    This reverts commit 0718b17.
    
    Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
    2 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    cb6f339 View commit details
    Browse the repository at this point in the history
  21. [ITN] fix year date graph, cardinals extension for hundreds (#5435)

    * wip
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * add lociko's hundreds extension for cardinals
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * add optional end
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * restart ci
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    ekmb authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    0f56bdb View commit details
    Browse the repository at this point in the history
  22. update doc in terms of get_label for lang id model (#5366)

    * reflect PR 5278 ion doc
    
    Signed-off-by: fayejf <fayejf07@gmail.com>
    
    * reflect comment
    
    Signed-off-by: fayejf <fayejf07@gmail.com>
    
    Signed-off-by: fayejf <fayejf07@gmail.com>
    fayejf authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    6a97bc9 View commit details
    Browse the repository at this point in the history
  23. Revert workaround for T5 that sets number of workers to 0 & sync_batc…

    …h_comm=False (#5420) (#5433)
    
    * Revert workers workaround
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Fix in config
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    * Fix
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    
    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    3 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    cec4715 View commit details
    Browse the repository at this point in the history
  24. Fixed bug in notebook (#5382) (#5394)

    Signed-off-by: Virginia Adams <vadams@nvidia.com>
    
    Signed-off-by: Virginia Adams <vadams@nvidia.com>
    
    Signed-off-by: Virginia Adams <vadams@nvidia.com>
    Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
    2 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    785426c View commit details
    Browse the repository at this point in the history
  25. Fixing bug in Megatron BERT when loss mask is all zeros (#5424)

    * Fixing bug when loss mask is fully zero
    
    Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Update megatron_bert_model.py
    
    Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
    
    * Update dataset_utils.py
    
    Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Update dataset_utils.py
    
    Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
    
    * Update dataset_utils.py
    
    Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
    
    Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    3 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    6b018da View commit details
    Browse the repository at this point in the history
  26. Use updated API for overlapping grad sync with pipeline parallelism (#…

    …5236)
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    51250e9 View commit details
    Browse the repository at this point in the history
  27. support to disable sequence length + 1 input tokens for each sample i…

    …n MegatronGPT (#5363)
    
    * support to disable sequence length + 1 input tokens for MegatronGPT
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    4 people authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    7041f09 View commit details
    Browse the repository at this point in the history
  28. [TTS] Create script for processing TTS training audio (#5262)

    * Create script for processing TTS training audio
    * Update VAD trimming logic
    * Remove unused import
    
    Signed-off-by: Ryan <rlangman@nvidia.com>
    rlangman authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    9de694a View commit details
    Browse the repository at this point in the history
  29. [TTS] remove useless logic for set_tokenizer. (#5430)

    Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
    XuesongYang authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    b75fc1a View commit details
    Browse the repository at this point in the history
  30. Fix setting up of ReduceLROnPlateau learning rate scheduler (#5444)

    * Fix tests
    
    Signed-off-by: PeganovAnton <peganoff2@mail.ru>
    
    * Add accidentally lost changes
    
    Signed-off-by: PeganovAnton <peganoff2@mail.ru>
    
    Signed-off-by: PeganovAnton <peganoff2@mail.ru>
    PeganovAnton authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    40e2fdf View commit details
    Browse the repository at this point in the history
  31. Create codeql.yml (#5445)

    Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
    
    Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
    titu1994 authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    841ed52 View commit details
    Browse the repository at this point in the history
  32. Fix for getting tokenizer in character-based ASR models when using ta…

    …rred dataset (#5442)
    
    Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
    
    Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
    jonghwanhyeon authored and tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    db6c136 View commit details
    Browse the repository at this point in the history
  33. Combine 5 commits

    adding diar_infer_general.yaml
    
    Signed-off-by: Taejin Park <tango4j@gmail.com>
    
    Update codeql.yml
    
    Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
    
    Update codeql.yml
    
    Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
    
    fix msdd_model in general yaml file
    
    Signed-off-by: Taejin Park <tango4j@gmail.com>
    
    fixed errors in yaml file
    
    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    a615aff View commit details
    Browse the repository at this point in the history
  34. resolved conflict

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    8f60d87 View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    8f37ecb View commit details
    Browse the repository at this point in the history

Commits on Nov 18, 2022

  1. moved eval_der function and fixed tqdm options

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 18, 2022
    Configuration menu
    Copy the full SHA
    1402c97 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'mulspk_asr_eval_script' of https://github.com/NVIDIA/NeMo

    … into mulspk_asr_eval_script
    tango4j committed Nov 18, 2022
    Configuration menu
    Copy the full SHA
    5390636 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f00250a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    92f4159 View commit details
    Browse the repository at this point in the history
  5. Changed minor error in docstrings

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 18, 2022
    Configuration menu
    Copy the full SHA
    1fa70e8 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'mulspk_asr_eval_script' of https://github.com/NVIDIA/NeMo

    … into mulspk_asr_eval_script
    tango4j committed Nov 18, 2022
    Configuration menu
    Copy the full SHA
    e2d519d View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2022

  1. removed score_labels and changed leave=True

    Signed-off-by: Taejin Park <tango4j@gmail.com>
    tango4j committed Nov 19, 2022
    Configuration menu
    Copy the full SHA
    037b61c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2b838af View commit details
    Browse the repository at this point in the history