Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NeMo-UX] Add token drop callback and optimize mixtral configs #10361

Merged
merged 21 commits into from
Sep 16, 2024

Conversation

JimmyZhang12
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

akoumpa
akoumpa previously approved these changes Sep 5, 2024
Copy link
Member

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@JimmyZhang12 JimmyZhang12 force-pushed the jiemingz/moe_token_drop branch from a381fc9 to f6818e0 Compare September 5, 2024 20:35
@JimmyZhang12 JimmyZhang12 changed the title Add token drop plugin and optimize mixtral configs Draft: Add token drop plugin and optimize mixtral configs Sep 5, 2024
@JimmyZhang12 JimmyZhang12 self-assigned this Sep 5, 2024
@JimmyZhang12 JimmyZhang12 changed the title Draft: Add token drop plugin and optimize mixtral configs Draft: [NeMo-UX] Add token drop plugin and optimize mixtral configs Sep 5, 2024
@gdengk
Copy link
Contributor

gdengk commented Sep 5, 2024

LGTM. Thanks!

@akoumpa akoumpa changed the title Draft: [NeMo-UX] Add token drop plugin and optimize mixtral configs Draft: [NeMo-UX] Add token drop callback and optimize mixtral configs Sep 5, 2024
@JimmyZhang12 JimmyZhang12 force-pushed the jiemingz/moe_token_drop branch from 480281f to 28a50ff Compare September 6, 2024 06:51
@github-actions github-actions bot added core Changes to NeMo Core NLP CI Multi Modal labels Sep 6, 2024
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@JimmyZhang12 JimmyZhang12 force-pushed the jiemingz/moe_token_drop branch 2 times, most recently from f0168cf to 889c91a Compare September 6, 2024 07:07
@github-actions github-actions bot removed core Changes to NeMo Core NLP CI Multi Modal labels Sep 6, 2024
@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 6, 2024
akoumpa
akoumpa previously approved these changes Sep 6, 2024
@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 6, 2024
@JimmyZhang12 JimmyZhang12 changed the title Draft: [NeMo-UX] Add token drop callback and optimize mixtral configs [NeMo-UX] Add token drop callback and optimize mixtral configs Sep 6, 2024
@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 11, 2024
@akoumpa akoumpa merged commit cc494c9 into main Sep 16, 2024
150 of 156 checks passed
@akoumpa akoumpa deleted the jiemingz/moe_token_drop branch September 16, 2024 16:47
akoumpa added a commit that referenced this pull request Sep 18, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Sep 18, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
akoumpa added a commit that referenced this pull request Sep 18, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
akoumpa added a commit that referenced this pull request Sep 19, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
gwarmstrong pushed a commit to gwarmstrong/NeMo that referenced this pull request Sep 19, 2024
…A#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
akoumpa added a commit that referenced this pull request Sep 23, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
akoumpa added a commit that referenced this pull request Sep 24, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
maanug-nv pushed a commit that referenced this pull request Oct 2, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa added a commit that referenced this pull request Oct 2, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
pablo-garay added a commit that referenced this pull request Oct 3, 2024
* [NeMo-UX] Add token drop callback and optimize mixtral configs (#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove run

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* length fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update pretrain_recipe_performance param dir -> ckpt_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Akoumparouli/nemo ux update param name (#10441)

* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>

* pass ckpt_dir to log_dir for the default_log

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* param rename

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Bump `Dockerfile.ci` (2024-09-09) (#10423)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* remove hf_resume for mixtral-8x3b

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update mistral recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* comment tests for non-merged recipes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* NeMoLogger uses log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix param

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix dockerfile build order

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
…A#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
…A#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
…A#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants