-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more documentation #63
Comments
We should also add guidance around picking optimal parameters. For example with gpipe, the right value for the 'chunks' parameter depends on the model size. |
would be also nice to add a discussion on when to use specific features, ie when to use pipelining and when to use tensor parallelism |
some frameworks expose the per param group flops, that's an approximation (ie not all flops take the same time to compute), but would that still be a useful heuristic, or should the size (in memory) be taken into account ? for my general understanding |
closed for now, to be revived if need be |
🚀 Feature
Motivation
Fairscale is hard to grok from the outside, lower the barrier of entry
Pitch
Easier to onboard people, easier to increase traction, clean things up
Alternatives
live in a cave
Additional context
The text was updated successfully, but these errors were encountered: