Highlights

Models

NeMo ASR

Multi-lookahead cache-aware streaming Conformer #6711
Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim #7330
Speech ehancement tutorial #6492
Support punctuation error rate #7538

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.10

Detailed Changelogs

ASR

Changelog

Fix missing pip package 'einops' by @RobinDong :: PR: #7397
Fix failure of installing pyaudio in Online_Offline_Speech_Commands_Demo.ipynb by @RobinDong :: PR: #7396
[ASR] Confidence measure -> method renames by @GNroy :: PR: #7434
RNN-T confidence and alignment bugfix by @GNroy :: PR: #7381
Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim :: PR: #7330
[TTS] Read audio as int32 to avoid flac read errors by @rlangman :: PR: #7477
Fix typos in confidence tutorial notebooks by @Kipok :: PR: #7581
Safeguard nemo_text_processing installation on ARM by @blisc :: PR: #7485
add fc large ls models by @nithinraok :: PR: #7641
[ASR] RNN-T greedy decoding max_frames fix for alignment and confidence by @GNroy :: PR: #7635
Create per.py by @ssh-meister :: PR: #7538
Update docs: readme, getting started, ASR intro by @erastorgueva-nv :: PR: #7679
[ASR] Multichannel mask estimator with flex number of channels by @anteju :: PR: #7317
Fix code block typo in docs by @erastorgueva-nv :: PR: #7717
Replace gpus with devices by @athitten :: PR: #7743
docs: fix typos by @shuoer86 :: PR: #7758
Snake act by @nithinraok :: PR: #7736
fix(clustering_diarizer.py): fix typo by @jqueguiner :: PR: #7772
Add some docs and update scripts for ASR by @titu1994 :: PR: #7790
remove TN from ctc_segm tut by @ekmb :: PR: #7807
Add support for finetuning with huggingface datasets by @stevehuang52 :: PR: #7834
Adding long-form audio speaker diarization (clustering) class and functions by @tango4j :: PR: #7737
Fix k2 installation: update for latest PyTorch, move script to dir by @artbataev :: PR: #7887
[ASR] GSS-based mask estimator by @anteju :: PR: #7849
add Dutch P&C FC model info by @zhehuaichen :: PR: #7892
Add checks for unit tests that are looking for data from CI machine by @ericharper :: PR: #7943
update branch name by @nithinraok :: PR: #7990
fix librosa display issue by @nithinraok :: PR: #7991
Fixes Notebooks for ASR by @titu1994 :: PR: #7994
cherry pick bug 4405781 by @karpnv :: PR: #8044
fix noise augmentation by @stevehuang52 :: PR: #8056
Fix various issues with broken links and bugs by @titu1994 :: PR: #8064
run with non-dev option by @nithinraok :: PR: #8077
update broken links by @nithinraok :: PR: #8079
langid bug fix by @karpnv :: PR: #8134

TTS

Changelog

Add steps for document of getting dataset 'SF Bilingual Speech' by @RobinDong :: PR: #7378
Fix checking of cuda/cpu device for inputs of Decoder by @RobinDong :: PR: #7444
Fix failure of ljspeech's get_data.py by @RobinDong :: PR: #7430
[TTS] Fix audio codec type checks by @rlangman :: PR: #7373
[TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7462
Fix adding positional embeddings in-place in FFTransformerDecoder by @The0nix :: PR: #7440
Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by @RobinDong :: PR: #7409
[TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7524
add italian tokenization by @GiacomoLeoneMaria :: PR: #7486
Remap speakers to continuous range of speaker_id for dataset AISHELL3 by @RobinDong :: PR: #7536
add ItalianPhonemesTokenizer by @GiacomoLeoneMaria :: PR: #7587
[TTS] Add STFT and SI-SDR loss to audio codec recipe by @rlangman :: PR: #7468
Fix typo in audio codec config, encoder target by @anteju :: PR: #7697
Group-residual vector quantizer by @anteju :: PR: #7643
French g2p with pronunciation dictionary by @mgrafu :: PR: #7601
add pleasefixme marker for potential failed nightly tests. by @XuesongYang :: PR: #7678
Add new text segmentation library for better TTS quality by @RobinDong :: PR: #7645
ConditionalInput: cat along the feature dim, not the batch dim by @anferico :: PR: #7785
Add selection criteria for reference audios in the submodule by @anferico :: PR: #7788
[Codec] Update codec checkpoint config by @anteju :: PR: #7835
[Codec] Finite scalar quantizer by @anteju :: PR: #7886
Tar codec by @nithinraok :: PR: #7867

LLM

Changelog

Allow disabling sanity checking when num_sanity_val_steps=0 by @athitten :: PR: #7413
Add comprehensive error messages by @PeganovAnton :: PR: #7261
layer selection for ia3 by @arendu :: PR: #7417
Add rope dynamic linear scaling by @hsiehjackson :: PR: #7437
Fix sft dataset truncation by @hsiehjackson :: PR: #7464
fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7452
Fix sft chat dataset truncation by @hsiehjackson :: PR: #7478
SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7511
remove auto generated examples by @arendu :: PR: #7510
Add the argument to by @odelalleau :: PR: #7264
PEFT GPT & T5 Refactor by @meatybobby :: PR: #7308
fix a typo by @BestJuly :: PR: #7496
StarCoder SFT test + bump PyT NGC image to 23.09 by @janekl :: PR: #7540
fix llama2 70b lora tuning bug by @cuichenx :: PR: #7622
generalized chat sft prompt by @yidong72 :: PR: #7655
Set base frequency from config by @shan18 :: PR: #7734
Megatron LLM documentation updates by @ssh-meister :: PR: #7400
Remove incorrect extra argument of load_from_checkpoint_dir() by @RobinDong :: PR: #7500
Add nemo to mcore GPT conversion script by @cuichenx :: PR: #7730
set context for text memmap to fork by @arendu :: PR: #7784
Support flash decoding by @hsiehjackson :: PR: #7744
update text server to support compute logprobs by @Zhilin123 :: PR: #7733
Revert PEFT eval fix by @ericharper :: PR: #7693
Fix tn duplex by @ekmb :: PR: #7808
Multimodal merge by @yaoyu-33 :: PR: #7728
Fix flash decoding precision by @hsiehjackson :: PR: #7852
Removing duplicate Megatron-LM installation by @Davood-M :: PR: #7864
adding special_tokens from tokenizer config for transformer-lm model by @clumsy :: PR: #7613
Add Adapter and IA3 support for MCore models by @cuichenx :: PR: #7750
Add back import guard by @cuichenx :: PR: #7882
Change FP8 Defaults by @cuichenx :: PR: #7894
Added knob for ub_tp_comm_overlap for the MCORE pass by @sanandaraj5597 :: PR: #7902
Upgrade NeMo to latest mcore and TE by @dimapihtar :: PR: #7862
Pad sequences to multiples of 16 for GPTSFTDataset by @vysarge :: PR: #7904
upgrade to latest mcore and TE by @dimapihtar :: PR: #7908
added missing torch import by @Davood-M :: PR: #7913
Fix CPU initialization of GPT models by @cuichenx :: PR: #7889
Fix pinned triton version by @hsiehjackson :: PR: #7925
fix tp_overlap config var name by @xrennvidia :: PR: #7928
only enable query key scaling during fp16 by @gshennvm :: PR: #7946
Fix for gpt3 eval hang with PP (a dtype issue) by @yaoyu-33 :: PR: #7927
Pass in rotary_base to mcore and from HF by @Kipok :: PR: #7933
Use NLPDDPStrategyNotebook in Multitask_Prompt_and_PTuning.ipynb by @athitten :: PR: #8061

General Improvements

Changelog

Add fix for max time to quit trainer gracefully, without running validation by @SeanNaren :: PR: #7731
SDE Tutorial minor fix by @Jorjeous :: PR: #7598
Temporary pin Lightning-Utilities version due to broken NamedTuple by @artbataev :: PR: #8022
Karpnv/issue 7320 by @karpnv :: PR: #7418
Speech Simulator, update README.md: output_path --> output_manifest_filepath by @popcornell :: PR: #7442
Fix None dataloader issue in PTL2.0 by @KunalDhawan :: PR: #7455
HF StarCoder to NeMo conversion script by @janekl :: PR: #7421
[doc] fix broken link by @stas00 :: PR: #7481
dllogger - log on rank 0 only by @stas00 :: PR: #7513
Add two youtube introductory videos to README and Docs. by @XuesongYang :: PR: #7570
defaults changed by @arendu :: PR: #7600
Bound transformers version in requirements by @athitten :: PR: #7620
Fix import error no module name model_utils by @menon92 :: PR: #7629
Fix in the confidence ensemble test by @Kipok :: PR: #7682
move core install to /workspace by @aklife97 :: PR: #7706
distributed checkpoint average script by @yidong72 :: PR: #7721
fix hybrid eval by @karpnv :: PR: #7757
fix(diarization-README): typo by @jqueguiner :: PR: #7771
Configure MCore logger by @mikolajblaz :: PR: #7781
Nemo to HF converter for LLaMA model by @uppalutkarsh :: PR: #7770
[Fix] Save best NeMo model only when necessary by @anteju :: PR: #7836
add guard if its a distributed checkpoint by @gshennvm :: PR: #7845
Update transformers cache on Jenkins by @ericharper :: PR: #7854
Update README.rst for container update by @fayejf :: PR: #7844
Fix mcore conversion bug by @cuichenx :: PR: #7846
add comment on script and fix target check by @gshennvm :: PR: #7881
fix issues with convert_nemo_llama_to_hf.py by @Zhilin123 :: PR: #7922
Instructions for running ci on pr template by @ericharper :: PR: #7944
Distributed checkpoint averaging supports bf16 type by @yidong72 :: PR: #7888
Fix tokenizer argparse in scripts by @titu1994 :: PR: #8012
Check dependencies in installation script by @artbataev :: PR: #8019
[SE Tutorial] USe GPU for inference, when available by @anteju :: PR: #8048
update reqs by @ericharper :: PR: #8072
Remove typo by @ericharper :: PR: #8146

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.22.0

Highlights

Models

NeMo Parakeet

NeMo Parakeet-TDT

ASR

NeMo ASR

Container

Detailed Changelogs

ASR

TTS

LLM

General Improvements

Contributors