Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Dockerfile.ci (2024-09-09) #10423

Merged
merged 19 commits into from
Sep 11, 2024
Merged

Bump Dockerfile.ci (2024-09-09) #10423

merged 19 commits into from
Sep 11, 2024

Conversation

ko3n1g
Copy link
Collaborator

@ko3n1g ko3n1g commented Sep 9, 2024

🚀 PR to Bump Dockerfile.ci.

📝 Please remember the following to-do's before merge:

  • Verify the presubmit CI

🙏 Please merge this PR only if the CI workflow completed successfully.

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
@github-actions github-actions bot added the CI label Sep 9, 2024
@akoumpa akoumpa force-pushed the bump-ci-container-2024-09-09 branch from 424fd5a to 269b6fd Compare September 9, 2024 19:05
@akoumpa akoumpa removed the Run CICD label Sep 9, 2024
@github-actions github-actions bot removed the CI label Sep 9, 2024
@akoumpa akoumpa force-pushed the bump-ci-container-2024-09-09 branch from c0581a6 to 269b6fd Compare September 9, 2024 19:36
@github-actions github-actions bot removed the CI label Sep 9, 2024
fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa and others added 2 commits September 9, 2024 19:07
…am_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
@akoumpa akoumpa removed the Run CICD label Sep 10, 2024
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
@pablo-garay
Copy link
Collaborator

pablo-garay commented Sep 11, 2024

@pablo-garay pablo-garay merged commit 1163e1e into main Sep 11, 2024
149 of 156 checks passed
@pablo-garay pablo-garay deleted the bump-ci-container-2024-09-09 branch September 11, 2024 19:07
adityavavre pushed a commit to adityavavre/NeMo that referenced this pull request Sep 15, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: adityavavre <aditya.vavre@gmail.com>
@ko3n1g
Copy link
Collaborator Author

ko3n1g commented Sep 23, 2024

hey @akoumpa, some of these changes are definitely missing in the release branch r2.0.0 (specifically, adding the attribute fp8 to ExampleConfig of tests/collections/llm/test_mnist_model_nemo2.py

Can you help me figure out if this is the only change that we should cherry-pick, of its safer to cherry-pick the whole PR?

akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
ko3n1g added a commit that referenced this pull request Sep 24, 2024
* Bump `Dockerfile.ci` (2024-09-09) (#10423)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* use /tmp/lora_tuning_tp2_sp instead of /home/TestData/nlp/lora_tuning_tp2_sp1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
maanug-nv pushed a commit that referenced this pull request Oct 2, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Oct 2, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
pablo-garay added a commit that referenced this pull request Oct 3, 2024
* [NeMo-UX] Add token drop callback and optimize mixtral configs (#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove run

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* length fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update pretrain_recipe_performance param dir -> ckpt_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Akoumparouli/nemo ux update param name (#10441)

* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>

* pass ckpt_dir to log_dir for the default_log

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* param rename

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Bump `Dockerfile.ci` (2024-09-09) (#10423)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* remove hf_resume for mixtral-8x3b

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update mistral recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* comment tests for non-merged recipes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* NeMoLogger uses log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix param

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix dockerfile build order

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants