Optimizer State Sharding #386
Unanswered
Sanger2000
asked this question in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It seems that there is currently no open source implementation of optimizer state sharding (ZeRO) in jax. This would be a great addition that greatly simplifies training large models using Adam or Adamw.
Beta Was this translation helpful? Give feedback.
All reactions