-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optim: APOLLO optimizer integration #36062
Conversation
eac6a62
to
26597ae
Compare
Hello @muellerz, The auto code check has some failures as above, but it seems not from our code but from other code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR ! Just had a look and it looks like the integration follows quite closely galore implementation. Can you simplify the code so that it is not redundant ?
Merge remote-tracking branch 'upstream/main' into apollo-integration
I have fixed your comment and updated our commit! Yep, I closely follow the GaLore implementation to add APOLLO support to ensure our code quality meets the standard of hugging face Fix your comments Fix redundancy Thank you again for your time and help! Best wishes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating ! Left a couple of comments. This is much better
Hi @SunMarc, Thank so much for your feedback! I submit two commits, with the first one adding the typing in the For the second commit, I agree that our current description of APOLLO in Thank you again for your time and help! Looking forward to your feedback! Best wishes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for iterating @zhuhanqing ! LGTM !
Letting @ArthurZucker having a quick look if possible. Otherwise, I'll merge in 2/3 days ! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @SunMarc, thank you so much for your help! Btw our APOLLO has been accepted by MLSys 2025, just announced today. I am eager to see APOLLO contribute more to the open-source community, especially democratizing LLM training with the integration with HF trainer! |
Congrats for your acceptance ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!! 🚀
What does this PR do?
This PR integrates APOLLO (Approximated Gradient Scaling for Memory-Efficient LLM Optimization) into Hugging Face's Transformers.
APOLLO is a memory-efficient optimizer designed for LLM pre-training and full-parameter fine-tuning, offering SGD-like memory cost with AdamW-level performance.
📜 Paper: https://arxiv.org/abs/2412.05270
💻 Code: https://github.com/zhuhanqing/APOLLO
Why APOLLO?
APOLLO introduces a new level of memory efficiency for LLM optimization:
✅ Ultra-Low Memory Usage → Achieves significant savings, even beyond GaLore, approaching SGD-level efficiency.
✅ Adam(W)-Level Performance → Maintains or surpasses Adam(W) performance, validated on LLaMA models up to 7B scale.
✅ No Expensive SVD Computation → Unlike GaLore, APOLLO leverages lightweight random projection, avoiding training stalls in large-scale LLM fine-tuning.
Third-Party Validation of APOLLO
APOLLO has been merged into LLaMA-Factory, FluxML, and with a validated performance in the post.
With these validations, merging APOLLO into Transformers would offer Hugging Face users an efficient, memory-friendly optimizer for training LLMs—reducing GPU memory requirements and making large-scale model training more accessible! 🚀
Test of the integration
Following the approach in PR #29588 for GaLore, I have successfully integrated APOLLO into Hugging Face Transformers.
✅ Ensuring Correctness
To verify the integration, I have:
1️⃣ Added multiple unit tests in
tests/trainer/test_trainer.py
.2️⃣ Manually tested the API using the following script:
Who can review?
@muellerzr and @SunMarc – Would love your feedback on this integration! 😊
Let me know if any modifications are needed. Thanks! 🚀