Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate FlashAttention into Megatron-LM #267

Merged
merged 2 commits into from
Jan 11, 2023
Merged

Integrate FlashAttention into Megatron-LM #267

merged 2 commits into from
Jan 11, 2023

Conversation

tridao
Copy link
Contributor

@tridao tridao commented Dec 9, 2022

We add an option to use FlashAttention in Megatron-LM.
Cc @ekelsen

@bryancatanzaro
Copy link

🎉

Thank you @tridao for sending this in!

@mayank31398
Copy link

Can't wait :)

@jaredcasper
Copy link
Collaborator

Thanks for the PR! Sorry for the delay, I went on an extended holiday break. :) Reviewing this now.

@jaredcasper jaredcasper merged commit c92f10b into NVIDIA:main Jan 11, 2023
@bryancatanzaro
Copy link

🎉

rraminen pushed a commit to rraminen/Megatron-LM that referenced this pull request Dec 12, 2023
* Enable universal ckpting

* Update run scripts

* Address PR feedback

* Remove line

* Fix white lines

* Remove redudant changes

* Apply to gpt_model only

* Code cleanup

* Code cleanup

* Update training.py

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* Update training.py

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* Log loss_scale only valid for fp16

* Add README and bf16 scripts

* Visualization docsts

---------

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
ko3n1g added a commit that referenced this pull request Oct 30, 2024
MoE Refactoring - Switch to mask-based routing for MoE

Closes #267

See merge request ADLR/megatron-lm!1915
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants