NVIDIA Neural Modules 1.22.0
Highlights
Models
NeMo Parakeet
Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/
- https://huggingface.co/nvidia/parakeet-rnnt-1.1b
- https://huggingface.co/nvidia/parakeet-ctc-1.1b
- https://huggingface.co/nvidia/parakeet-rnnt-0.6b
- https://huggingface.co/nvidia/parakeet-ctc-0.6b
NeMo Parakeet-TDT
Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet-tdt/
ASR
- stt_en_fastconformer_transducer_large_ls #7641
- stt_en_fastconformer_ctc_larg_ls #7641
- stt_en_fastconformer_hybrid_large_streaming_multi
- stt_nl_fastconformer_hybrid_large_pc
- stt_fa_fastconformer_hybrid_large
NeMo ASR
- Multi-lookahead cache-aware streaming Conformer #6711
- Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim #7330
- Speech ehancement tutorial #6492
- Support punctuation error rate #7538
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.10
Detailed Changelogs
ASR
Changelog
- Fix missing pip package 'einops' by @RobinDong :: PR: #7397
- Fix failure of installing pyaudio in Online_Offline_Speech_Commands_Demo.ipynb by @RobinDong :: PR: #7396
- [ASR] Confidence measure -> method renames by @GNroy :: PR: #7434
- RNN-T confidence and alignment bugfix by @GNroy :: PR: #7381
- Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim :: PR: #7330
- [TTS] Read audio as int32 to avoid flac read errors by @rlangman :: PR: #7477
- Fix typos in confidence tutorial notebooks by @Kipok :: PR: #7581
- Safeguard nemo_text_processing installation on ARM by @blisc :: PR: #7485
- add fc large ls models by @nithinraok :: PR: #7641
- [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence by @GNroy :: PR: #7635
- Create per.py by @ssh-meister :: PR: #7538
- Update docs: readme, getting started, ASR intro by @erastorgueva-nv :: PR: #7679
- [ASR] Multichannel mask estimator with flex number of channels by @anteju :: PR: #7317
- Fix code block typo in docs by @erastorgueva-nv :: PR: #7717
- Replace gpus with devices by @athitten :: PR: #7743
- docs: fix typos by @shuoer86 :: PR: #7758
- Snake act by @nithinraok :: PR: #7736
- fix(clustering_diarizer.py): fix typo by @jqueguiner :: PR: #7772
- Add some docs and update scripts for ASR by @titu1994 :: PR: #7790
- remove TN from ctc_segm tut by @ekmb :: PR: #7807
- Add support for finetuning with huggingface datasets by @stevehuang52 :: PR: #7834
- Adding long-form audio speaker diarization (clustering) class and functions by @tango4j :: PR: #7737
- Fix k2 installation: update for latest PyTorch, move script to dir by @artbataev :: PR: #7887
- [ASR] GSS-based mask estimator by @anteju :: PR: #7849
- add Dutch P&C FC model info by @zhehuaichen :: PR: #7892
- Add checks for unit tests that are looking for data from CI machine by @ericharper :: PR: #7943
- update branch name by @nithinraok :: PR: #7990
- fix librosa display issue by @nithinraok :: PR: #7991
- Fixes Notebooks for ASR by @titu1994 :: PR: #7994
- cherry pick bug 4405781 by @karpnv :: PR: #8044
- fix noise augmentation by @stevehuang52 :: PR: #8056
- Fix various issues with broken links and bugs by @titu1994 :: PR: #8064
- run with non-dev option by @nithinraok :: PR: #8077
- update broken links by @nithinraok :: PR: #8079
- langid bug fix by @karpnv :: PR: #8134
TTS
Changelog
- Add steps for document of getting dataset 'SF Bilingual Speech' by @RobinDong :: PR: #7378
- Fix checking of cuda/cpu device for inputs of Decoder by @RobinDong :: PR: #7444
- Fix failure of ljspeech's get_data.py by @RobinDong :: PR: #7430
- [TTS] Fix audio codec type checks by @rlangman :: PR: #7373
- [TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7462
- Fix adding positional embeddings in-place in FFTransformerDecoder by @The0nix :: PR: #7440
- Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by @RobinDong :: PR: #7409
- [TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7524
- add italian tokenization by @GiacomoLeoneMaria :: PR: #7486
- Remap speakers to continuous range of speaker_id for dataset AISHELL3 by @RobinDong :: PR: #7536
- add ItalianPhonemesTokenizer by @GiacomoLeoneMaria :: PR: #7587
- [TTS] Add STFT and SI-SDR loss to audio codec recipe by @rlangman :: PR: #7468
- Fix typo in audio codec config, encoder target by @anteju :: PR: #7697
- Group-residual vector quantizer by @anteju :: PR: #7643
- French g2p with pronunciation dictionary by @mgrafu :: PR: #7601
- add pleasefixme marker for potential failed nightly tests. by @XuesongYang :: PR: #7678
- Add new text segmentation library for better TTS quality by @RobinDong :: PR: #7645
- ConditionalInput: cat along the feature dim, not the batch dim by @anferico :: PR: #7785
- Add selection criteria for reference audios in the submodule by @anferico :: PR: #7788
- [Codec] Update codec checkpoint config by @anteju :: PR: #7835
- [Codec] Finite scalar quantizer by @anteju :: PR: #7886
- Tar codec by @nithinraok :: PR: #7867
LLM
Changelog
- Allow disabling sanity checking when num_sanity_val_steps=0 by @athitten :: PR: #7413
- Add comprehensive error messages by @PeganovAnton :: PR: #7261
- layer selection for ia3 by @arendu :: PR: #7417
- Add rope dynamic linear scaling by @hsiehjackson :: PR: #7437
- Fix sft dataset truncation by @hsiehjackson :: PR: #7464
- fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7452
- Fix sft chat dataset truncation by @hsiehjackson :: PR: #7478
- SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7511
- remove auto generated examples by @arendu :: PR: #7510
- Add the argument to by @odelalleau :: PR: #7264
- PEFT GPT & T5 Refactor by @meatybobby :: PR: #7308
- fix a typo by @BestJuly :: PR: #7496
- StarCoder SFT test + bump PyT NGC image to 23.09 by @janekl :: PR: #7540
- fix llama2 70b lora tuning bug by @cuichenx :: PR: #7622
- generalized chat sft prompt by @yidong72 :: PR: #7655
- Set base frequency from config by @shan18 :: PR: #7734
- Megatron LLM documentation updates by @ssh-meister :: PR: #7400
- Remove incorrect extra argument of load_from_checkpoint_dir() by @RobinDong :: PR: #7500
- Add nemo to mcore GPT conversion script by @cuichenx :: PR: #7730
- set context for text memmap to fork by @arendu :: PR: #7784
- Support flash decoding by @hsiehjackson :: PR: #7744
- update text server to support compute logprobs by @Zhilin123 :: PR: #7733
- Revert PEFT eval fix by @ericharper :: PR: #7693
- Fix tn duplex by @ekmb :: PR: #7808
- Multimodal merge by @yaoyu-33 :: PR: #7728
- Fix flash decoding precision by @hsiehjackson :: PR: #7852
- Removing duplicate Megatron-LM installation by @Davood-M :: PR: #7864
- adding special_tokens from tokenizer config for transformer-lm model by @clumsy :: PR: #7613
- Add Adapter and IA3 support for MCore models by @cuichenx :: PR: #7750
- Add back import guard by @cuichenx :: PR: #7882
- Change FP8 Defaults by @cuichenx :: PR: #7894
- Added knob for ub_tp_comm_overlap for the MCORE pass by @sanandaraj5597 :: PR: #7902
- Upgrade NeMo to latest mcore and TE by @dimapihtar :: PR: #7862
- Pad sequences to multiples of 16 for GPTSFTDataset by @vysarge :: PR: #7904
- upgrade to latest mcore and TE by @dimapihtar :: PR: #7908
- added missing torch import by @Davood-M :: PR: #7913
- Fix CPU initialization of GPT models by @cuichenx :: PR: #7889
- Fix pinned triton version by @hsiehjackson :: PR: #7925
- fix tp_overlap config var name by @xrennvidia :: PR: #7928
- only enable query key scaling during fp16 by @gshennvm :: PR: #7946
- Fix for gpt3 eval hang with PP (a dtype issue) by @yaoyu-33 :: PR: #7927
- Pass in rotary_base to mcore and from HF by @Kipok :: PR: #7933
- Use NLPDDPStrategyNotebook in Multitask_Prompt_and_PTuning.ipynb by @athitten :: PR: #8061
General Improvements
Changelog
- Add fix for max time to quit trainer gracefully, without running validation by @SeanNaren :: PR: #7731
- SDE Tutorial minor fix by @Jorjeous :: PR: #7598
- Temporary pin Lightning-Utilities version due to broken NamedTuple by @artbataev :: PR: #8022
- Karpnv/issue 7320 by @karpnv :: PR: #7418
- Speech Simulator, update README.md: output_path --> output_manifest_filepath by @popcornell :: PR: #7442
- Fix None dataloader issue in PTL2.0 by @KunalDhawan :: PR: #7455
- HF StarCoder to NeMo conversion script by @janekl :: PR: #7421
- [doc] fix broken link by @stas00 :: PR: #7481
- dllogger - log on rank 0 only by @stas00 :: PR: #7513
- Add two youtube introductory videos to README and Docs. by @XuesongYang :: PR: #7570
- defaults changed by @arendu :: PR: #7600
- Bound transformers version in requirements by @athitten :: PR: #7620
- Fix import error no module name model_utils by @menon92 :: PR: #7629
- Fix in the confidence ensemble test by @Kipok :: PR: #7682
- move core install to /workspace by @aklife97 :: PR: #7706
- distributed checkpoint average script by @yidong72 :: PR: #7721
- fix hybrid eval by @karpnv :: PR: #7757
- fix(diarization-README): typo by @jqueguiner :: PR: #7771
- Configure MCore logger by @mikolajblaz :: PR: #7781
- Nemo to HF converter for LLaMA model by @uppalutkarsh :: PR: #7770
- [Fix] Save best NeMo model only when necessary by @anteju :: PR: #7836
- add guard if its a distributed checkpoint by @gshennvm :: PR: #7845
- Update transformers cache on Jenkins by @ericharper :: PR: #7854
- Update README.rst for container update by @fayejf :: PR: #7844
- Fix mcore conversion bug by @cuichenx :: PR: #7846
- add comment on script and fix target check by @gshennvm :: PR: #7881
- fix issues with convert_nemo_llama_to_hf.py by @Zhilin123 :: PR: #7922
- Instructions for running ci on pr template by @ericharper :: PR: #7944
- Distributed checkpoint averaging supports bf16 type by @yidong72 :: PR: #7888
- Fix tokenizer argparse in scripts by @titu1994 :: PR: #8012
- Check dependencies in installation script by @artbataev :: PR: #8019
- [SE Tutorial] USe GPU for inference, when available by @anteju :: PR: #8048
- update reqs by @ericharper :: PR: #8072
- Remove typo by @ericharper :: PR: #8146