Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akoumparouli/nemo ux update param name #10441

Merged
merged 6 commits into from
Sep 16, 2024

Conversation

akoumpa
Copy link
Member

@akoumpa akoumpa commented Sep 10, 2024

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa self-assigned this Sep 10, 2024
Copy link
Collaborator

@ashors1 ashors1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering what the reason for this change is. This will be a breaking change and will require us to update the user guide. Also, the use of dir is consistent with NeMo 1

[edit] I saw there were concerns that the current naming could lead to silent bugs in the future, and this makes sense to me. Better to change it now than to run into bugs moving forward. Thanks for making this change.

@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 10, 2024
@ashors1 ashors1 added Run CICD and removed Run CICD labels Sep 11, 2024
@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 11, 2024
@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 11, 2024
@akoumpa akoumpa merged commit 62deef0 into main Sep 16, 2024
149 of 156 checks passed
@akoumpa akoumpa deleted the akoumparouli/nemo_ux_update_param_name branch September 16, 2024 16:42
gwarmstrong pushed a commit to gwarmstrong/NeMo that referenced this pull request Sep 19, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
maanug-nv pushed a commit that referenced this pull request Oct 2, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
akoumpa added a commit that referenced this pull request Oct 2, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
pablo-garay added a commit that referenced this pull request Oct 3, 2024
* [NeMo-UX] Add token drop callback and optimize mixtral configs (#10361)

* add token drop plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add checks

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add expert parallel configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* amend comment

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add comm overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix rebase errors

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add test configs

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove run

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* length fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update pretrain_recipe_performance param dir -> ckpt_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Akoumparouli/nemo ux update param name (#10441)

* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>

* pass ckpt_dir to log_dir for the default_log

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* param rename

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Bump `Dockerfile.ci` (2024-09-09) (#10423)

* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8307fcd !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* update TE import paths

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Update parallelisms.rst

fix sed typo.

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* fix for mcore dist opt refactor: move overlap_grad_reduce/overlap_param_gather to ddp config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* remove overlap_grad_reduce overlap_param_gather from autoconfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass TransformerConfig because megatronmodule expects it to have fp8 attr

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert change; Use ModelParallelConfig & add fp8

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix, set NVTE_APPLY_QK_LAYER_SCALIN=1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

* remove hf_resume for mixtral-8x3b

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update mistral recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* comment tests for non-merged recipes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* NeMoLogger uses log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix param

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix dockerfile build order

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
* NeMoLogger: update dir to log_dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMologger: update calls

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants