New Features
- Distributed Inference Support for ESM2 and Geneformer
- Enables linear inference throughput as GPU number is increased
- See ESM2 inference notebook and use
--num-gpus
parameter.
Updates & Improvements
- Prior Geneformer inference on H100 accuracy regression fixed.
- Base image updated to
nvcr.io/nvidia/pytorch:24.12-py3
; python updated to 3.12 among other core dependency upgrades (base container release notes here).
Changes
- Distributed Inference Support for ESM2/Geneformer by @farhadrgh in #482
- Flexible memory management to avoid fragmentation-related CUDA OOM by @farhadrgh in #524
- Update nightly Docker image tag by @tshimko-nv in #539
- set UV_NO_CACHE by @pstjohn in #529
- RowFeatureIndex Optimization by @polinabinder1 in #531
- Updates to NvFaidx, Fasta Noodles, and Sequence Accessor by @skothenhill-nv in #532
- Fix csv dataset by @holgerroth in #543
- Run all pytests even if submodules fail by @pstjohn in #545
- xFail known bad tests on H100 and fix CVEs by @gagank1 in #549
- Fully Integrate SCDL into Geneformer by @savitha-eng in #480
- Fix MLM loss ignore idx by @farhadrgh in #552
- Attempts to bump the base image to pytorch:24.07 by @pstjohn in #544
- Pstjohn/update base image 2410 by @pstjohn in #551
- [BUFIX] fail when passed fastas with duplicate sequence ids by @skothenhill-nv in #555
- Update ddp config to improve ESM-2 15B MFU by @sichu2023 in #520
- add temporary mistune pin to fix docs build issue by @pstjohn in #559
- Bump 3rdparty/Megatron-LM from
99f23d2
to2da43ef
by @dependabot in #558 - Bump 3rdparty/NeMo from
06e6703
to06a1491
by @dependabot in #538 - update base image to 24.12 by @pstjohn in #553
- Un-xfail geneformer on H100 test by @trvachov in #563
- update devcontainer for new ubuntu base image by @pstjohn in #566
- don't eagerly download esm2 checkpoints by @pstjohn in #567
- run pytest with or without docs and notebooks in run_pytest.sh by @dorotat-nv in #569
- Jwilber/bionemo example small updates by @jwilber in #561
- remove unused file from repo by @jwilber in #562
- add initial configs for perf testing on ESM2 in JET (bionemo2) by @dorotat-nv in #497
- Add pre-training page for ESM-2 by @pstjohn in #578
- Edits to README and CONTRIBUTING.md, moving some text around by @pstjohn in #577
- Refactor dockerfile for better caching and avoid pbss download in notebook test by @pstjohn in #573
- Bump 3rdparty/NeMo from
06a1491
tod44ed44
by @dependabot in #580 - Simplify ESM2 finetune test by @farhadrgh in #576
- default to overlap_param_gather by @sichu2023 in #582
- Bump 3rdparty/Megatron-LM from
2da43ef
to65720c8
by @dependabot in #579 - Add self-hosted azure runner workflows by @pstjohn in #587
- ARM docker build with 24.12 pytorch fw image by @trvachov in #581
- Add gpu target identificator to JET configs by @dorotat-nv in #586
- add codecov badge by @pstjohn in #588
- Add support for marking and skipping slow tests, temporarily mark pydantic tests as slow by @pstjohn in #589
- pin cdifflib version by @pstjohn in #593
- Remove outdated note on very large datasets in MultiEpochDataset by @pstjohn in #521
- Bump 3rdparty/NeMo from
eb9848b
toabd4bf7
by @dependabot in #597 - Revert "pin cdifflib version (#593)" by @pstjohn in #599
- fix esm2_pretrain.yaml by @dorotat-nv in #600
- add myself to ci by @nvdreidenbach in #594
- Bump virtualenv from 20.26.3 to 20.26.6 by @dependabot in #596
- Bump 3rdparty/Megatron-LM from
65720c8
toc76410a
by @dependabot in #592 - move load calls, rename test for better readibility by @pstjohn in #601
- Only run cleanup if tests ran, adds pytest marker config for slow tests by @pstjohn in #595
- only run trufflehog on diff by @pstjohn in #604
- run trufflehog on entire main branch on push action by @pstjohn in #605
- add comments to the unit-test.yaml file by @pstjohn in #606
- Remove v2.0 from README title by @pstjohn in #602
- ESM2 Finetuning refactor by @farhadrgh in #574
- fix image links in esm2 model card by @pstjohn in #584
- Release of v1.0 of BioNeMo Modular Co-Design (MoCo) by @nvdreidenbach in #575
- fix devcontainer paths in ubuntu 24 by @pstjohn in #610
- Bump rsync and other dockerfile lints by @pstjohn in #603
- Jm/codeowners revamp by @jomitchellnv in #617
- Update MoCo notebooks by @nvdreidenbach in #614
- set min seq len by default by @pstjohn in #621
- hotfix for some failing python tests due to NGC files being moved around by @pstjohn in #626
- Bump 3rdparty/Megatron-LM from
c76410a
to4fb4c3d
by @dependabot in #624 - revert ESM2 Finetuning refactor (#574) by @farhadrgh in #628
New Contributors
- @holgerroth made their first contribution in #543
- @nvdreidenbach made their first contribution in #594
Full Changelog: v2.2...v2.3