diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 000000000000..3edbfa6d9d6e
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,622 @@
+# Changelog
+
+## NVIDIA Neural Modules 2.0.0rc1
+
+### Highlights
+
+#### Large language models
+
+- PEFT: QLoRA support, LoRA/QLora for Mixture-of-Experts (MoE) dense layer
+- State Space Models & Hybrid Architecture support (Mamba2 and NV-Mamba2-hybrid)
+- Support Nemotron, Minitron, Gemma2, Qwen, RAG
+- Custom Tokenizer training in NeMo
+- Update the Auto-Configurator for EP, CP and FSDP
+
+#### Multimodal
+
+- NeVA: Add SOTA LLM backbone support (Mixtral/LLaMA3) and suite of model parallelism support (PP/EP)
+- Support Language Instructed Temporal-Localization Assistant (LITA) on top of video NeVA
+
+#### ASR
+
+- SpeechLM and SALM
+- Adapters for Canary Customization
+- Pytorch allocator in PyTorch 2.2 improves training speed up to 30% for all ASR models
+- Cuda Graphs for Transducer Inference
+- Replaced webdataset with Lhotse - gives up to 2x speedup
+- Transcription Improvements - Speedup and QoL Changes
+- ASR Prompt Formatter for multimodal Canary
+
+#### Export & Deploy
+
+- In framework PyTriton deployment with backends: - PyTorch - vLLM - TRT-LLM update to 0.10
+- TRT-LLM C++ runtime
+
+### Detailed Changelogs
+
+#### ASR
+
+Changelog
+
+- Support dataloader as input to `audio` for transcription by @titu1994 :: PR: #9201
+- Clean up dev docs collection section by @yaoyu-33 :: PR: #9205
+- Fix Online_Offline_Microphone_VAD_Demo.ipynb by @stevehuang52 :: PR: #9251
+- Remove .nemo instead of renaming by @mikolajblaz :: PR: #9281
+- Fix GreedyBatchedCTCInfer regression from GreedyCTCInfer. by @galv :: PR: #9347
+- Revert "Fix GreedyBatchedCTCInfer regression from GreedyCTCInfer." by @titu1994 :: PR: #9351
+- Prompt formatter API and canary transcribe tensor input support by @pzelasko :: PR: #9206
+- Fix prompt formatter's defaults=None case in multi-task model by @pzelasko :: PR: #9366
+- move AED chunked infer script by @stevehuang52 :: PR: #9367
+- Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. by @galv :: PR: #9198
+- ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_C… by @ko3n1g :: PR: #9399
+- Fix logging message for ASR by @titu1994 :: PR: #9469
+- Add support to change Multi task model prompt by @titu1994 :: PR: #9542
+- Enable encoder adapters for Canary and MultiTaskAED models by @titu1994 :: PR: #9409
+- Audio model collection by @anteju :: PR: #9263
+- TitaNet Batch Verify Speaker by @monica-sekoyan :: PR: #9337
+- Fix the arguments of forward_for_export function in msdd_models by @tango4j :: PR: #9624
+- chore: Pin branch in notebooks by @ko3n1g :: PR: #9697
+- refactor: notebook branch release by @ko3n1g :: PR: #9711
+- Canary Adapters tutorial (#9670) by @nithinraok :: PR: #9777
+- typos and branch name update to r2.0.0rc1 by @nithinraok :: PR: #9846
+- Fix RNNT alignments test by @artbataev :: PR: #9770
+- By default trust remote code from HF Datasets by @nithinraok :: PR: #9886
+- Temporarily disable cuda graph based RNN-T greedy inference for r2.0.0rc1 by @galv :: PR: #9904
+- Enable CUDA graphs by default, but require CUDA 12.6 for full graphs by @artbataev :: PR: #9919
+- update branch name for script by @nithinraok :: PR: #9936
+- updte branch by @nithinraok :: PR: #9942
+
+
+#### TTS
+
+Changelog
+- Clean up dev docs collection section by @yaoyu-33 :: PR: #9205
+- Add mel codec checkpoints by @anteju :: PR: #9228
+- GPU unit tests: Mark flaky tests to be fixed by @pablo-garay :: PR: #9559
+- chore: Pin branch in notebooks by @ko3n1g :: PR: #9697
+- Create __init__.py by @stevehuang52 :: PR: #9892
+- [NeMo-UX] Fixes to make PreemptionCallback work by @hemildesai :: PR: #9830
+- Fix Docker build. Make Dockerfile consistent with CI by @artbataev :: PR: #9784
+- Multimodal data prep notebook fix by @cuichenx :: PR: #9910
+- [NeMo-UX] Add distributed checkpointing unit tests by @ashors1 :: PR: #9794
+- r2.0.0rc1 fix for dist checkpoint loading by @yaoyu-33 :: PR: #9854
+- [NeMo-UX] Rename sdk references to NeMo Run by @hemildesai :: PR: #9872
+- [NeMo-UX] Fix some serialization bugs by @ashors1 :: PR: #9868
+- add mixtral neva tutorial (moe + token fusion + siglip) by @paul-gibbons :: PR: #9926
+- [NeMo-UX] Add more NeMo Logger tests by @ashors1 :: PR: #9795
+- Akoumparouli/mixtral fixes for r2.0.0rc1 by @akoumpa :: PR: #9911
+- R2.0.0rc1 clip fix by @Slyne :: PR: #9871
+- [NeMo-UX] Add missing docstrings and update some defaults by @ashors1 :: PR: #9895
+- Add REST service requirements.txt by @oyilmaz-nvidia :: PR: #9923
+- add bert latest fix by @JRD971000 :: PR: #9921
+- remove empy reconfigure_limit_batches by @akoumpa :: PR: #9934
+- fix mem by @terrykong :: PR: #9964
+- Run a sample query for a quantized model conditionally by @janekl :: PR: #9965
+- Add pydantic-settings by @oyilmaz-nvidia :: PR: #9961
+- Resiliency features update by @jbieniusiewi :: PR: #9714
+- [NeMo-UX] Wrap task config save in a try/except by @ashors1 :: PR: #9956
+- [NeMo-UX] Update default PTL logging `save_dir` by @ashors1 :: PR: #9954
+- Fix lita tutorial by @Slyne :: PR: #9980
+- Add deploy and REST API support to NeMo 2.0 by @athitten :: PR: #9834
+
+
+## NVIDIA Neural Modules 2.0.0rc0
+
+### Highlights
+
+#### LLM and MM
+
+##### Models
+
+- Megatron Core RETRO
+ - Pre-training
+ - Zero-shot Evaluation
+
+- Pretraining, conversion, evaluation, SFT, and PEFT for:
+ - Mixtral 8X22B
+ - Llama 3
+ - SpaceGemma
+
+- Embedding Models Fine Tuning
+ - Mistral
+ - BERT
+
+- BERT models
+ - Context Parallel
+ - Distributed checkpoint
+
+- Video capabilities with NeVa
+
+##### Performance
+
+- Distributed Checkpointing
+ - Torch native backend
+ - Parallel read/write
+ - Async write
+
+- Multimodal LLM (LLAVA/NeVA)
+ - Pipeline Parallelism support
+ - Sequence packing support
+
+##### Export
+
+- Integration of Export & Deploy Modules into NeMo Framework container
+ - Upgrade to TRT-LLM 0.9
+
+#### Speech (ASR & TTS)
+
+##### Models
+
+- AED Multi Task Models (Canary) - Multi-Task Multi-Lingual Speech Recognition / Speech Translation model
+- Multimodal Domain - Speech LLM supporting SALM Model
+- Parakeet-tdt_ctc-1.1b Model - RTFx of > 1500 (can transcribe 1500 seconds of audio in 1 second)
+- Audio Codec 16kHz Small - NeMo Neural Audio Codec for discretizing speech for use in LLMs
+ - mel_codec_22khz_medium
+ - mel_codec_44khz_medium
+
+##### Perf Improvements
+
+- Transcribe() upgrade - Enables one line transcribe with files, tensors, data loaders
+- Frame looping algorithm for RNNT faster decoding - Improves Real Time Factor (RTF) by 2-3x
+- Cuda Graphs + Label-Looping algorithm for RNN-T and TDT Decoding - Transducer Greedy decoding at over 1500x RTFx, on par with CTC Non-Autoregressive models
+- Semi Sorted Batching support - External User contribution that speeds up training by 15-30%.
+
+##### Customization
+
+- Context biasing for CTC word stamping - Improve accuracy for custom vocabulary and pronunciation
+ - Longform Inference
+ - Longform inference support for AED models
+- Transcription of multi-channel audio for AED models
+
+##### Misc
+
+- Upgraded webdataset - Speech and LLM / Multimodal unified container
+
+### Detailed Changelogs
+
+#### ASR
+
+Changelog
+
+- Enable using hybrid asr models in CTC Segmentation tool by @erastorgueva-nv :: PR: #8828
+- TDT confidence fix by @GNroy :: PR: #8982
+- Fix union type annotations for autodoc+mock-import rendering by @pzelasko :: PR: #8956
+- NeMo dev doc restructure by @yaoyu-33 :: PR: #8896
+- Improved random seed configuration for Lhotse dataloaders with docs by @pzelasko :: PR: #9001
+- Fix #8948, allow preprocessor to be stream captured to a cuda graph when doing per_feature normalization by @galv :: PR: #8964
+- [ASR] Support for transcription of multi-channel audio for AED models by @anteju :: PR: #9007
+- Add ASR latest news by @titu1994 :: PR: #9073
+- Fix docs errors and most warnings by @erastorgueva-nv :: PR: #9006
+- PyTorch CUDA allocator optimization for dynamic batch shape dataloading in ASR by @pzelasko :: PR: #9061
+- RNN-T and TDT inference: use CUDA graphs by default by @artbataev :: PR: #8972
+- Fix #8891 by supported GPU-side batched CTC Greedy Decoding by @galv :: PR: #9100
+- Update branch for notebooks and ci in release by @ericharper :: PR: #9189
+- Enable CUDA graphs by default only for transcription by @artbataev :: PR: #9196
+- rename paths2audiofiles to audio by @nithinraok :: PR: #9209
+- Fix ASR_Context_Biasing.ipynb contains FileNotFoundError by @andrusenkoau :: PR: #9233
+- Cherrypick: Support dataloader as input to `audio` for transcription (#9201) by @titu1994 :: PR: #9235
+- Update Online_Offline_Microphone_VAD_Demo.ipynb by @stevehuang52 :: PR: #9252
+- Dgalvez/fix greedy batch strategy name r2.0.0rc0 by @galv :: PR: #9243
+- Accept None as an argument to decoder_lengths in GreedyBatchedCTCInfer::forward by @galv :: PR: #9246
+- Fix loading github raw images on notebook by @nithinraok :: PR: #9282
+- typos by @nithinraok :: PR: #9314
+- Re-enable cuda graphs in training modes. by @galv :: PR: #9338
+- add large model stable training fix and contrastive loss update for variable seq by @nithinraok :: PR: #9259
+- Fix conv1d package in r2.0.0rc0 by @pablo-garay :: PR: #9369
+- Fix GreedyBatchedCTCInfer regression from GreedyCTCInfer. (#9347) by @titu1994 :: PR: #9350
+- Make a backward compatibility for old MSDD configs in label models by @tango4j :: PR: #9377
+- Force diarizer to use CUDA if cuda is available and if device=None. by @tango4j :: PR: #9380
+
+
+
+#### TTS
+
+Changelog
+
+- [TTS] Add tutorial for training audio codecs by @rlangman :: PR: #8723
+- Update radtts.py by @blisc :: PR: #9097
+- [Nemo CICD] RADTTS test optional by @pablo-garay :: PR: #9112
+- Remove Radtts CI test by @blisc :: PR: #9144
+- Fix T5 G2P Input and Output Types by @blisc :: PR: #9224
+
+
+
+#### LLM and MM
+
+Changelog
+
+- Rachitg/dpa by @rachitgarg91 :: PR: #8911
+- Remove precision args in trainer due to PTL update by @yaoyu-33 :: PR: #8908
+- Huvu/mcore retro by @huvunvidia :: PR: #8861
+- fsdp tp > 1 bug fix by @dimapihtar :: PR: #8947
+- Fix memory leak at loss func by @minitu :: PR: #8868
+- change the condition for get qkv tensor from linear_qkv output in mcoremixin by @HuiyingLi :: PR: #8965
+- Add safety checks for 'data' key in MegatronGPTModel cfg by @HuiyingLi :: PR: #8991
+- [NeMo-UX] Adding MegatronParallel by @cuichenx :: PR: #8987
+- Skip top_p computations when set to 1.0 by @odelalleau :: PR: #8905
+- Gemma bug by @cuichenx :: PR: #8962
+- [NeMo-UX] Adding megatron strategy by @marcromeyn :: PR: #8995
+- Quantized checkpoint support in export and deploy modules by @janekl :: PR: #8859
+- add geglu to mlp swap by @JRD971000 :: PR: #8999
+- add timeout for new_group by @acphile :: PR: #8998
+- Zero-shot evaluation pipeline for mcore RETRO by @huvunvidia :: PR: #8941
+- Added fusion for squared relu by @sanandaraj5597 :: PR: #8963
+- Developer Documents for mcore RETRO by @huvunvidia :: PR: #9026
+- [NeMo-UX] Adding GPTModel & MockDataModule by @marcromeyn :: PR: #9011
+- Adding unit test for mcore RETRO model by @huvunvidia :: PR: #9022
+- docs and simplification of cmd args by @arendu :: PR: #8979
+- [NeMo-UX] Add checkpoint-io to MegatronStrategy by @marcromeyn :: PR: #9057
+- Enable Sequence Packing and Pipeline Parallel in NeVA by @yaoyu-33 :: PR: #8957
+- Mingyuanm/add back fp8 support to sd by @Victor49152 :: PR: #9070
+- unfused lora by @arendu :: PR: #9004
+- Handle case where num_query_groups is set to null for LoRA config setup by @vysarge :: PR: #9075
+- Alit/griffin by @JRD971000 :: PR: #9021
+- Implement DistributedCheckpointIO by @mikolajblaz :: PR: #9016
+- Video Neva Pretraining + Inference Implementation by @paul-gibbons :: PR: #9095
+- HF to .nemo for Mixtral-8x22B-instruct by @akoumpa :: PR: #9060
+- mcore ds updates by @dimapihtar :: PR: #8951
+- Alit/griffin perf by @JRD971000 :: PR: #9107
+- Add assert for max_steps to be positive in MegatronGPTSFTModel by @athitten :: PR: #9110
+- Extend sequence length padding for GPT SFT to account for context parallel by @vysarge :: PR: #8869
+- Update gpt dataset config parameter for mock by @thomasdhc :: PR: #9118
+- Add Mcore DistributedDataParallel and distributed optimizer into Nemo by @gdengk :: PR: #9034
+- Revert "Add assert for max_steps to be positive in MegatronGPTSFTMode… by @pablo-garay :: PR: #9128
+- scripts to convert HF lora to nemo by @arendu :: PR: #9102
+- Prevent duplicated checkpoints by @mikolajblaz :: PR: #9015
+- add TN/ITN link in speech tools list by @erastorgueva-nv :: PR: #9142
+- Cleanup deprecated files and temporary changes by @cuichenx :: PR: #9088
+- Use DP+CP groups as the FSDP sharding domain by @erhoo82 :: PR: #9145
+- CUDA memory profile by @erhoo82 :: PR: #9096
+- Fix missing func for T5 model by @gdengk :: PR: #9141
+- Add knob for load_directly_on_device by @mikolajblaz :: PR: #9125
+- Revert rope fusion defaults by @cuichenx :: PR: #9238
+- Update nemo.export module for quantized models by @janekl :: PR: #9250
+- Fix circular import for MM dataprep notebook by @cuichenx :: PR: #9287
+- neva media_type + text generation default fix by @paul-gibbons :: PR: #9257
+- fix lora and ptuning and isort/black by @oyilmaz-nvidia :: PR: #9290
+- add check if num layers is divisible by pp size by @dimapihtar :: PR: #9208
+- Fix P-tuning for Llama based models by @apanteleev :: PR: #9297
+- add deprecation warnings by @pablo-garay :: PR: #9266
+- move pooler under post_process by @dimapihtar :: PR: #9328
+- add deprecation note for nmt by @dimapihtar :: PR: #9342
+- Fix incorrect checkpoint removal logic (#9192) by @mikolajblaz :: PR: #9204
+- fix fp16 precision issue by @dimapihtar :: PR: #9376
+- Fix module.training for Neva in FusedAttn backward which causes nan by @yaoyu-33 :: PR: #8877
+
+
+
+#### Export
+
+Changelog
+
+- Updates for TRT-LLM 0.9 by @oyilmaz-nvidia :: PR: #8873
+- Mingyuanm/sdxl export by @Victor49152 :: PR: #8926
+- Avoid unpacking NeMo checkpoints before exporting to TRT-LLM by @apanteleev :: PR: #8866
+- Update gemma for trt-llm 0.9 by @oyilmaz-nvidia :: PR: #8974
+- TRT-LLM export P-tuning related fixes by @apanteleev :: PR: #8863
+
+
+
+#### General Improvements
+
+Changelog
+
+- Update package info by @ericharper :: PR: #8793
+- [Nemo CICD] Update mcore 4.13.24 by @pablo-garay :: PR: #8917
+- Akoumparouli/low mem mixtral ckpt converter by @akoumpa :: PR: #8895
+- Adding RETRO tests to Action Tests (cicd-main.yml) by @huvunvidia :: PR: #8942
+- Akoumparouli/fix sd train 2 by @akoumpa :: PR: #8883
+- Update te install for jenkins by @ericharper :: PR: #8954
+- [Nemo CICD] Add last job depending on others for blocking check by @pablo-garay :: PR: #8959
+- Minor quantization pipeline updates by @janekl :: PR: #8924
+- Fix External CLIP Converter by @yaoyu-33 :: PR: #8960
+- PP support in LoRA merge script by @cuichenx :: PR: #8934
+- Update PR template by @ericharper :: PR: #8978
+- Update Latest News by @shashank3959 :: PR: #8837
+- Fix incorrect link to latest news in README by @shashank3959 :: PR: #8985
+- Update dependency install for LLM and MM by @ericharper :: PR: #8990
+- Temporarily remove mcore dep by @ericharper :: PR: #9010
+- [Nemo CICD] further specialize runners for more parallelism by @pablo-garay :: PR: #9036
+- Update mm dataprep notebook based on feedback by @cuichenx :: PR: #9029
+- Fix import in lora merge script by @cuichenx :: PR: #9032
+- [Nemo CICD] Run when labeled:Run CICD by @pablo-garay :: PR: #9044
+- [Nemo CICD] Add tag/label for 1-gpu runner by @pablo-garay :: PR: #9046
+- [Nemo CICD] checkout v4 by @pablo-garay :: PR: #9048
+- [Nemo CICD] Remove temp test change by @pablo-garay :: PR: #9049
+- remove in-place addition for dreambooth train with text encoder by @Victor49152 :: PR: #8825
+- Mingyuanm/sdxl quantization notebook by @Victor49152 :: PR: #9042
+- [Nemo CICD] Trigger on comment issued by @pablo-garay :: PR: #9062
+- zarr ckpt to torch_dist ckpt converter by @dimapihtar :: PR: #8842
+- Restore PTQ tests for Llama2 (reopened) by @janekl :: PR: #9064
+- add clip H config by @JRD971000 :: PR: #9082
+- [NeMo-UX] Add mixed-precision plugin by @marcromeyn :: PR: #9065
+- Comment baichuan test and update pr template by @ericharper :: PR: #9085
+- Add safe extraction of nemo tar files by @athitten :: PR: #8976
+- Improved `shard_id` parsing in `LazyNemoTarredIterator`, enables AIS dataloading by @pzelasko :: PR: #9077
+- [NeMo-UX] Add mistral-7b model by @marcromeyn :: PR: #9066
+- Llama3 Conversion Script Update by @suiyoubi :: PR: #9089
+- dehardcode test string by @JimmyZhang12 :: PR: #8865
+- [Nemo CICD] Try trigger cicd run on comment by @pablo-garay :: PR: #9111
+- Lhotse dataloading: RIR augmentation and nemo/tarred input support for RIR and noise aug by @pzelasko :: PR: #9109
+- mixtral evaluation PR by @Slyne :: PR: #8989
+- [Nemo CICD] Revert: run GHA cicd on comment by @pablo-garay :: PR: #9119
+- [Nemo CICD] Comment out flaky test: running too long by @pablo-garay :: PR: #9123
+- [Nemo CICD] Add timeout to unit tests by @pablo-garay :: PR: #9132
+- [Nemo CICD] Indicate optional test in name (prefix) by @pablo-garay :: PR: #9139
+- video neva null image+video folder path fix by @paul-gibbons :: PR: #9116
+- [NeMo-UX] Add data module by @cuichenx :: PR: #9133
+- NeMo Inference Requirements by @oyilmaz-nvidia :: PR: #9093
+- Remove debug print by @maanug-nv :: PR: #9074
+- Remove legacy CI by @pablo-garay :: PR: #9149
+- Update support for push_to_hf_hub() by @titu1994 :: PR: #9159
+- [Nemo CICD] comment out flaky PTQ tests by @pablo-garay :: PR: #9160
+- Update branch by @ericharper :: PR: #9211
+- dist adam transpose fix by @dimapihtar :: PR: #9239
+- [Nemo CICD] Increase time limit for Speech_Checkpoints_tests (#9186) by @pablo-garay :: PR: #9247
+- Pin transformers by @ericharper :: PR: #9261
+- Fix typo in HF tutorial by @titu1994 :: PR: #9302
+
+
+
+## NVIDIA Neural Modules 1.23.0
+
+### Highlights
+
+#### Models
+
+##### Nvidia Starcoder 2 - 15B
+
+- Announcement - https://developer.nvidia.com/blog/unlock-your-llm-coding-potential-with-starcoder2/
+- AI Foundation Model Inference - https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/starcoder2-15b
+- https://huggingface.co/bigcode/starcoder2-15b
+
+##### NeMo Canary
+Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/
+
+- https://huggingface.co/nvidia/canary-1b
+
+#### NeMo LLM
+
+- Falcon
+- Code Llama
+- StarCoder
+- GPT perf improvements
+- Context parallelism
+- Mistral
+- Mixtral (without expert parallelism)
+- Mcore GPT Dataset integration
+
+#### NeMo MM
+- CLIP
+- Stable Diffusion (supporting LoRA)
+- Imagen
+- ControlNet (for SD)
+- Instruct pix2pix (for SD)
+- LLAVA
+- NeVA
+- DreamFusion++
+- NSFW filtering
+
+#### NeMo ASR
+
+- Lhotse Dataloading support #7880
+- Canary: Multi task multi lingual ASR #8242
+- LongForm Audio for Diarization #7737
+- Faster algorithm for RNN-T Greedy #7926
+- Cache-Aware streaming notebook #8296
+
+#### NeMo TTS
+
+#### NeMo Vision
+
+#### Known Issues
+
+##### ASR
+
+###### RNNT WER calculation when fused batch size > 1 during validation / test step()
+
+Previously, the RNNT metric was stateful while the CTC one was not ([r1.22.0](https://github.com/NVIDIA/NeMo/blob/r1.22.0/nemo/collections/asr/metrics/rnnt_wer_bpe.py#L419-L420), [r1.23.0](https://github.com/NVIDIA/NeMo/blob/r1.23.0/nemo/collections/asr/metrics/wer.py#L333))
+
+Therefore this calculation in the RNNT joint for fused operation worked properly. However with the unification of metrics in r1.23.0, a bug was introduced where only the last sub-batch of metrics calculates the scores and does not accumulate. This is patched via https://github.com/NVIDIA/NeMo/pull/8587 and will be fixed in the next release.
+
+**Workaround**: Explicitly disable fused batch size during inference using the following command
+
+```python
+from omegaconf import open_dict
+model = ...
+decoding_cfg = model.cfg.decoding
+with open_dict(decoding_cfg):
+ decoding_cfg.fused_batch_size = -1
+model.change_decoding_strategy(decoding_cfg)
+```
+
+Note: This bug does not affect scores calculated via model.transcribe() (since it does not calculate metrics during inference, just text), or using the `transcribe_speech.py` or `speech_to_text_eval.py` in `examples/asr`.
+
+###### Two failing unit tests due to a change in expected results, caused by lhotse version update
+
+#### Container
+
+For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
+
+`docker pull nvcr.io/nvidia/nemo:24.01.speech`
+
+#### ASR
+
+Changelog
+
+- Update link to yaml file in ASR_with_Transducers.ipynb by @Faith-Nchifor :: PR: #8014
+- Use convert_hf_dataset_to_nemo by @karpnv :: PR: #8017
+- Update asr_language_modeling.rst: Add a missing word by @martin0258 :: PR: #8007
+- spelling mistake by @orena1 :: PR: #7903
+- update asr eval by @stevehuang52 :: PR: #8045
+- fix noise aug by @stevehuang52 :: PR: #8057
+- Various fixes for typos and urls by @titu1994 :: PR: #8066
+- [Fix] Increase length check tolerance to prevent test failing by @anteju :: PR: #8067
+- Add text metrics to asr eval by @stevehuang52 :: PR: #8087
+- fix device setting to allow using accelerator cpu by @orena1 :: PR: #8084
+- .ctm in data simulator annotator compliant with RT-09 specification by @popcornell :: PR: #8004
+- Fix AST eval by @stevehuang52 :: PR: #8112
+- fix: numba.*_num_threads resets torch num_threads #8141 by @itzsimpl :: PR: #8145
+- Update dependencies by @titu1994 :: PR: #8156
+- NeMo + Lhotse integration by @pzelasko :: PR: #7880
+- Speedup RNN-T greedy decoding by @artbataev :: PR: #7926
+- [docker] Install k2 before NeMo for faster image rebuilding by @pzelasko :: PR: #8204
+- [docs] Add --force_codec to tarred dataset creation examples by @pzelasko :: PR: #8227
+- Temporarily use the previous RNN-T decoding algorithm as default by @artbataev :: PR: #8226
+- Make TDT inference not require duration params by @hainan-xv :: PR: #8207
+- Cache Aware Streaming tutorial notebook by @erastorgueva-nv :: PR: #8296
+- fix path location and branch by @nithinraok :: PR: #8304
+- Attention encoder-decoder models for multiple speech-to-text tasks … by @titu1994 :: PR: #8324
+- Remove asr webapp by @titu1994 :: PR: #8347
+- remove _target_ at model level in aed model config [ASR] by @krishnacpuvvada :: PR: #8351
+- Add change_vocabulary and save_tokenizers() support to Multitask ASR models by @titu1994 :: PR: #8357
+- Change default beam size by @titu1994 :: PR: #8371
+- adding jenkins test for speech_to_text_aed model by @krishnacpuvvada :: PR: #8368
+- Add Finetuning tutorial with HF Datasets by @nithinraok :: PR: #8356
+- wer fix by @tbartley94 :: PR: #8404
+- add ensemble decoding fix by @nithinraok :: PR: #8427
+- Update k2 by @artbataev :: PR: #8492
+
+
+
+#### TTS
+
+Changelog
+
+- [TTS] Scale sampler steps by number of devices by @rlangman :: PR: #7947
+- Add All Multimodal Source Code Part 2: Text to image, x to nerf by @yaoyu-33 :: PR: #7970
+- [TTS] Add period discriminator and feature matching loss to codec recipe by @rlangman :: PR: #7884
+- Added VectorQuantizer base class by @anteju :: PR: #8011
+
+
+
+#### LLMS
+
+Changelog
+
+- Add interface to set NCCL options of each process group by @erhoo82 :: PR: #7923
+- Support O2 training of PEFT and SFT by @cuichenx :: PR: #7971
+- [NLP] Access scaler only in FP16 case by @janekl :: PR: #7916
+- [NLP] Minor improvements in Llama conversion script by @janekl :: PR: #7978
+- [NLP] Use helpers from utils_funcs.py in Llama conversion by @janekl :: PR: #7979
+- [NLP] Remove replace_sampler_ddp (deprecated in Trainer) by @janekl :: PR: #7981
+- Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 by @trias702 :: PR: #7920
+- Remove deprecated arguments from TE's TransformerLayer by @jbaczek :: PR: #7917
+- Add All Multimodal Source Code by @yaoyu-33 :: PR: #7791
+- First draft of mcore bert model in NeMo by @shanmugamr1992 :: PR: #7814
+- Support Falcon Variants (7B/40B/180B) in Mcore NeMo by @xuanzic :: PR: #7666
+- FSDP + Tensor Parallelism by @erhoo82 :: PR: #7897
+- Packed Sequence by @cuichenx :: PR: #7945
+- Adding method back that was removed accidentally by @ericharper :: PR: #8038
+- [NLP] ArtifactItem with init=True to make it debuggable by @janekl :: PR: #7980
+- SFT patch: (1) enable sequence parallelism and (2) enable profile by @erhoo82 :: PR: #7963
+- migration to PTL 2.0 for spellmapper model by @bene-ges :: PR: #7924
+- Change the megatron config lr scheduler default and fix to change partitions script by @shan18 :: PR: #8094
+- (1) Add SHARP interface to M-CORE, (2) use send/recv to send train loss to the first rank instead of b-cast by @erhoo82 :: PR: #7793
+- Reconfigure limit_val_batches only for int by @athitten :: PR: #8099
+- Fixing wrapper and moving it to base class by @shanmugamr1992 :: PR: #8055
+- fix gated_linear_unit bug by @Agoniii :: PR: #8042
+- Fix Adapter for MCore models by @cuichenx :: PR: #8124
+- add war fix for sync issues by @gshennvm :: PR: #8130
+- Improve PEFT UX by @cuichenx :: PR: #8131
+- Enhance flexibility by passing callbacks as method argument by @michal2409 :: PR: #8015
+- context parallelism by @xrennvidia :: PR: #7739
+- Make pipelined TP comm overlap available with mcore by @erhoo82 :: PR: #8005
+- remove deprecated scripts by @arendu :: PR: #8138
+- adding OnlineSampleMapping by @arendu :: PR: #8137
+- Add distopt support for FP8 params and BF16 optimizer state by @timmoon10 :: PR: #7909
+- Revert adding OnlineSampleMapping by @pablo-garay :: PR: #8164
+- Token count and sequence length logging for MegatronGPTSFTModel by @vysarge :: PR: #8136
+- Use latest apex internal API by @jbaczek :: PR: #8129
+- tune specific params in the base model by @arendu :: PR: #7745
+- Virtual pipeline parallel support for MegatronGPTSFTModel by @vysarge :: PR: #7964
+- removed deprecated peft model by @arendu :: PR: #8183
+- remove more deprecated files by @arendu :: PR: #8169
+- Pre-generate cu_seqlens argmin and max_seqlen to remove host-to-device sync by @erhoo82 :: PR: #8108
+- Add the interface to use SHARP to FSDP strategy by @erhoo82 :: PR: #8202
+- Multimodal required NLP base model changes by @yaoyu-33 :: PR: #8188
+- [NLP] Improve and unify loading state_dict for community models by @janekl :: PR: #7977
+- Rename Finetuning Scripts by @cuichenx :: PR: #8201
+- Final multimodal PR with our recent developments on MM side by @yaoyu-33 :: PR: #8127
+- Add include_text parameter to SFT dataloaders by @Kipok :: PR: #8198
+- Add random_seed argument to generate by @Kipok :: PR: #8162
+- Added support for neptune logger by @harishankar-gopalan :: PR: #8210
+- Pre-compute max_seqlen and cu_seqlens_argmin in all model-parallel cases by @erhoo82 :: PR: #8222
+- Use PackedSeqParams in accordance with changes in Megatron-LM by @cuichenx :: PR: #8205
+- Fix to peft & virtual pipeline parallel unsupported check by @vysarge :: PR: #8216
+- Fixed the tp overlap switch by @sanandaraj5597 :: PR: #8195
+- add knobs for rope/swiglu fusion by @lhb8125 :: PR: #8184
+- Added sample cpu_offloading switch to YAML by @sanandaraj5597 :: PR: #8148
+- Syncing random seed between ranks in generate by @Kipok :: PR: #8230
+- add first_val_step to mcore scheduler by @JimmyZhang12 :: PR: #8150
+- Correct padding for SFT input data to account for sequence parallel + TE's fp8 op dimension requirements by @vysarge :: PR: #8240
+- Mistral 7b conversion script by @akoumpa :: PR: #8052
+- switch to mcore dataset [with FIM support] by @dimapihtar :: PR: #8149
+- Mixtral to NeMo conversion script. by @akoumpa :: PR: #8155
+- fixes to accomendate mcore changes by @HuiyingLi :: PR: #8261
+- Allow MegatronPretrainingRandomSampler to do multi-epoch training by @trias702 :: PR: #8239
+- Add dist ckpt support for regular optimizers by @mikolajblaz :: PR: #7749
+- add deallocate pipeline output optimization by @JimmyZhang12 :: PR: #8279
+- Fix memory leak caused by context parallelism hanging references by omegaconf by @JimmyZhang12 :: PR: #8299
+- distributed fused adam + rampup bs support by @dimapihtar :: PR: #8302
+- Update PEFT Doc by @cuichenx :: PR: #8262
+- Converter script fixes for mixtral/mistral by @akoumpa :: PR: #8272
+- Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 by @erhoo82 :: PR: #8334
+- Enable megatron core loggers for GPT pretraining by @ashbhandare :: PR: #8354
+- mcore ds fix by @dimapihtar :: PR: #8283
+- release updates by @dimapihtar :: PR: #8378
+- Mcore customization doc by @HuiyingLi :: PR: #8298
+- updated link to pubmed by @nithinraok :: PR: #8402
+- mcore customization doc minor fix by @HuiyingLi :: PR: #8421
+- Fixing mcore bert for TP, PP and SP by @shanmugamr1992 :: PR: #8336
+- Add settings to suppress bf16 compile errors in CI on V100 by @athitten :: PR: #8481
+- MoE parameter passing by @akoumpa :: PR: #8255
+- Add fp8 support for SD/Update notebook paths by @Victor49152 :: PR: #8489
+
+
+
+#### NeMo Tools
+
+Changelog
+
+- SDE bugfix log by @Jorjeous :: PR: #8430
+
+
+
+#### General Improvements
+
+Changelog
+
+- Add news section to README by @ericharper :: PR: #7984
+- Fixing conversion script to work for code llama by @shanmugamr1992 :: PR: #7997
+- Fix crash when converting to mcore a model using rotary embeddings by @odelalleau :: PR: #7998
+- Added a procedure for Windows users, README by @Jorjeous :: PR: #7942
+- Update manifest.py to speedup loading tarred datasets by @stevehuang52 :: PR: #7900
+- [Fix] Fixed name of a test by @anteju :: PR: #7986
+- Fix lora merge script by @cuichenx :: PR: #8113
+- Support transcoding audio formats when saving tarred datasets (FLAC, OPUS) by @pzelasko :: PR: #8102
+- README edit to change Apple Silicon install instructions (to fix a break introduced by pytorch 2) by @stephenmcconnachie :: PR: #8122
+- Fixes NVIDIA/apex installation to not erroneously install the pkg by @terrykong :: PR: #8126
+- Graphviz fix by @GNroy :: PR: #7843
+- Update README.rst by @fayejf :: PR: #8154
+- Fix TP>1 issue for conversion script by @cuichenx :: PR: #8144
+- Support torch jit script by @artbataev :: PR: #8027
+- NeMo Multimodal Docs and Tests Initial PR by @yaoyu-33 :: PR: #8028
+- Remove left-over prints in NeMo+Lhotse code by @pzelasko :: PR: #8180
+- Upgrade to DLFW PyTorch 23.12 by @ericharper :: PR: #8163
+- Add Lhotse support for key in NeMo manifests by @pzelasko :: PR: #8197
+- Fix CPU Initialization and TP>1 for LoRA Merge Script by @cuichenx :: PR: #8199
+- Add support in Neural Typecheck to disable semantic checks by @titu1994 :: PR: #8212
+- Pin lhotse=1.19.2 in r1.23.0 by @pzelasko :: PR: #8303
+- Multimodal r1.23.0 bug fix by @yaoyu-33 :: PR: #8315
+- MCore dataset compatibility for tokenizers by @vysarge :: PR: #8390
+- Update NFA video download link by @erastorgueva-nv :: PR: #8406
+- Update MM Dataprep Tutorial by @cuichenx :: PR: #8410
+- Fix dreambooth data sampler issue by @yaoyu-33 :: PR: #8400
+- Fix a bug in CTM line processing function for multi-speaker data simulations by @tango4j :: PR: #8416
+- Akoumparouli/mistral bugfix by @akoumpa :: PR: #8353
+- pin to 0.5.0 by @ericharper :: PR: #8465
+- Update NeMo Multimodal Requirements by @yaoyu-33 :: PR: #8515
+- Fix link in multimodal dataprep tutorial by @cuichenx :: PR: #8517
+
+
\ No newline at end of file