Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot scaling laws of our baseline models #125

Open
slippylolo opened this issue Oct 5, 2021 · 2 comments
Open

Plot scaling laws of our baseline models #125

slippylolo opened this issue Oct 5, 2021 · 2 comments
Assignees
Labels
arch&scale Architecture and Scaling Modeling Group Good First Issue Good for newcomers

Comments

@slippylolo
Copy link

For our three baselines on different datasets (OSCAR, C4, The Pile), we would like to plot scaling laws and retrieve their coefficients. Specifically, we are looking to reproduce Figure 1 of Scaling Laws for Neural Language Models.

The TensorBoard data for the baseline runs can be retrieved on the Big Science space on HuggingFace: it's the tr3 runs with tensorboard in their name. The naming scheme (tr3b, tr3c, etc.) is explained here.
For C4, we have a XL, L, and M model (tr3, tr3c, tr3c) with short warm-up. For OSCAR and The Pile, we have an XL, L, M, and S model (tr3d, tr3g, tr3h, tr3i and tr3, tr3j, tr3k, tr3l). For OSCAR, we can should also add the 13B run to see if the fits hold (that's tr1-13B).

@slippylolo slippylolo added Good First Issue Good for newcomers arch&scale Architecture and Scaling Modeling Group labels Oct 5, 2021
@srulikbd
Copy link

srulikbd commented Oct 6, 2021

so just to make sure-the loss is taken from the "lm-loss-validation/lm loss validation"?
from the last step? or from the global minimum loss?

@thomasw21
Copy link
Member

I've temporarily assigned @slippylolo , feel free to re-assign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch&scale Architecture and Scaling Modeling Group Good First Issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants