Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Fix] Hanging for Fully Randomized Bucketing (#4348)
* Update container to 22.05 (#4329) * update container to 22.05 Signed-off-by: ericharper <complex451@gmail.com> * try adding safe directory Signed-off-by: ericharper <complex451@gmail.com> * try env var Signed-off-by: ericharper <complex451@gmail.com> * printenv Signed-off-by: ericharper <complex451@gmail.com> * try GIT_BRANCH Signed-off-by: ericharper <complex451@gmail.com> * typo Signed-off-by: ericharper <complex451@gmail.com> * remove dbug statements Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> * Merge r1.9.0 main (#4331) * update branch Signed-off-by: ericharper <complex451@gmail.com> * update package info Signed-off-by: ericharper <complex451@gmail.com> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix typo Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix image Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix image Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Fix typo Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Fix typo Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Do not create tmp directory Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Fix parameter name Signed-off-by: PeganovAnton <peganoff2@mail.ru> * finish cherry-pick op Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Fix labels errors Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Remove duplicate stage Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Change target branch Signed-off-by: PeganovAnton <peganoff2@mail.ru> * fix doc (#4146) Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <emshabalin@yandex.ru> * Fix for new TTSDataset class Signed-off-by: treacker <emshabalin@yandex.ru> * added wandb logging Signed-off-by: treacker <emshabalin@yandex.ru> * added wandb logging Signed-off-by: treacker <emshabalin@yandex.ru> * fix numpy version Signed-off-by: treacker <emshabalin@yandex.ru> * fix numpy version Signed-off-by: treacker <emshabalin@yandex.ru> * inference fix Signed-off-by: treacker <emshabalin@yandex.ru> * removed old code Signed-off-by: treacker <emshabalin@yandex.ru> * updated parser logic Signed-off-by: treacker <emshabalin@yandex.ru> * reverted version update Signed-off-by: treacker <emshabalin@yandex.ru> * refactored parser logic Signed-off-by: treacker <emshabalin@yandex.ru> * Updated Jenkinsfile Signed-off-by: treacker <emshabalin@yandex.ru> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <emshabalin@yandex.ru> * Made backward compatibility Signed-off-by: treacker <emshabalin@yandex.ru> * Made backward compatibility Signed-off-by: treacker <emshabalin@yandex.ru> * Update Jenkinsfile Signed-off-by: treacker <emshabalin@yandex.ru> * Update tacotron.yaml Signed-off-by: treacker <emshabalin@yandex.ru> * Refactoring Signed-off-by: treacker <emshabalin@yandex.ru> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix typo Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix image Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix image Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: treacker <emshabalin@yandex.ru> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: treacker <emshabalin@yandex.ru> * Refactoring Signed-off-by: treacker <emshabalin@yandex.ru> * Refactoring Signed-off-by: treacker <emshabalin@yandex.ru> * Fixed jenkins Signed-off-by: treacker <emshabalin@yandex.ru> * Refactoring Signed-off-by: treacker <emshabalin@yandex.ru> * Refactoring Signed-off-by: treacker <emshabalin@yandex.ru> * Refactoring Signed-off-by: treacker <emshabalin@yandex.ru> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * start fix Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * updated scp to filelist Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <fayejf07@gmail.com> * remove extremely unreliable links Signed-off-by: fayejf <fayejf07@gmail.com> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * chunks -> segments Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * Khz -> kHz Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * small fix (#4180) Signed-off-by: fayejf <fayejf07@gmail.com> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <yidong@nvidia.com> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Small improvements to warnings Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Error and warning messages improvements Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Update ContextNet version (#4207) Signed-off-by: smajumdar <smajumdar@nvidia.com> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * add level in docs Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: ekmb <ebakhturina@nvidia.com> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * restore previously deleted files Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * style fix Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * restore previously deleted files Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * style fix Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * update tutorial Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> * fix json serialize (#4235) Signed-off-by: Yi Dong <yidong@nvidia.com> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <vadams@nvidia.com> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <gpasandi@nvidia.com> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <gpasandi@nvidia.com> Co-authored-by: Ghasem Pasandi <gpasandi@nvidia.com> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <vadams@nvidia.com> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Clear cells Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <smajumdar@nvidia.com> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <smajumdar@nvidia.com> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <smajumdar@nvidia.com> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Eric Harper <complex451@gmail.com> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <vadams@nvidia.com> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <vadams@nvidia.com> * Increased num workers Signed-off-by: Virginia Adams <vadams@nvidia.com> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <vadams@nvidia.com> * put number of workers back Signed-off-by: Virginia Adams <vadams@nvidia.com> * upper bound lightning Signed-off-by: ericharper <complex451@gmail.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * update config Signed-off-by: ericharper <complex451@gmail.com> * remove duplicate test Signed-off-by: ericharper <complex451@gmail.com> * fix tn test cases Signed-off-by: ericharper <complex451@gmail.com> * add another safe.directory Signed-off-by: ericharper <complex451@gmail.com> * typo Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: PeganovAnton <peganoff2@mail.ru> Co-authored-by: treacker <36159472+treacker@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: bene-ges <61418381+bene-ges@users.noreply.github.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Ghasem <35242805+pasandi20@users.noreply.github.com> Co-authored-by: Ghasem Pasandi <gpasandi@nvidia.com> Co-authored-by: Aleksandr Laptev <laptevsasha12@gmail.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> * fix full_randn bucket hang Signed-off-by: stevehuang52 <heh@nvidia.com> * remove unused variables Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: PeganovAnton <peganoff2@mail.ru> Co-authored-by: treacker <36159472+treacker@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: bene-ges <61418381+bene-ges@users.noreply.github.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Ghasem <35242805+pasandi20@users.noreply.github.com> Co-authored-by: Ghasem Pasandi <gpasandi@nvidia.com> Co-authored-by: Aleksandr Laptev <laptevsasha12@gmail.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
- Loading branch information