Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) #339

Merged
merged 4 commits into from
Aug 6, 2023

Conversation

ssmi153
Copy link
Contributor

@ssmi153 ssmi153 commented Aug 5, 2023

This is a fix for this issue: #338 . It fixes the XFormersAttention monkeypatch. I've tested this with LLama-2 70B and Llama-2 13B and both are able to start training appropriately without any errors.

Notes:

  1. I haven't yet tested this with an complete finetune from start to finish - this will take a number of days to complete.
  2. We still also need to fix the FlashAttention monkeyscript. I've been working on this but have run into a number of errors, so I'm pushing this out in the meantime.

ssmi153 added 2 commits August 5, 2023 11:01
Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: huggingface/transformers@07360b6#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51 from their llama_modeling.py file.
@ssmi153
Copy link
Contributor Author

ssmi153 commented Aug 5, 2023

Maybe hold off on committing this - I just found that while it runs LLama-2-13b properly, the losses are completely different (starts at 8 rather than starting at 1). I need to look more closely into why this is, as it shouldn't be changing anything for the 13B model.

Command had accidentally been moved out of if-else block.
@ssmi153
Copy link
Contributor Author

ssmi153 commented Aug 5, 2023

OK, this most recent commit fixes the difference in losses. It was a whitespace issue - darn Python! This now performs exactly like it did previously for 13B (and 7B) models without GQA, and also now works for 70B models with GQA. It's ready for review and merging.

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@winglian
Copy link
Collaborator

winglian commented Aug 5, 2023

@ssmi153 looks like there are some pre-commit formatting issues. pre-commit run --all-files should fix it

@ssmi153
Copy link
Contributor Author

ssmi153 commented Aug 6, 2023 via email

@winglian
Copy link
Collaborator

winglian commented Aug 6, 2023

@ssmi153 you could cherry-pick this commit 9793faf

@ssmi153
Copy link
Contributor Author

ssmi153 commented Aug 6, 2023

Champion! I've pulled your changes into the pull request (in probably in the most awkward possible way, sorry - it's been a few years since I last commited code, and our last setup used mercurial, so I'm just getting my head around Github's workflows).

@winglian
Copy link
Collaborator

winglian commented Aug 6, 2023

Thanks @ssmi153. This is very much appreciated and wanted.

@winglian winglian merged commit 10405b9 into axolotl-ai-cloud:main Aug 6, 2023
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
…olotl-ai-cloud#339)

* Fix XFormers attention for Llama-2 70B (GQA)

Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: huggingface/transformers@07360b6#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51 from their llama_modeling.py file.

* Catch configs without pretraining_tp

* Whitespace bug fix

Command had accidentally been moved out of if-else block.

* pre-commit formatting fixes

Thanks to @winglian
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* Fix XFormers attention for Llama-2 70B (GQA)

Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: huggingface/transformers@07360b6#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51 from their llama_modeling.py file.

* Catch configs without pretraining_tp

* Whitespace bug fix

Command had accidentally been moved out of if-else block.

* pre-commit formatting fixes

Thanks to @winglian
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants