-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need model size dumped at init #123
Comments
I think I can try and take this issue. However, I have to know what do you do get the diagnostics dump? |
Also, does the dump happen when starting the workflows? |
Thank you for offering to work on this, @jtboing We, the BS group, haven't added anything yet to this functionality, so it's totally up to you how you do it - please have a look at the various info logged during Meg-DS startup and add it where you feel is right. Probably the best place to do it is where the model is created since you can then easily query the params. I don't think it really matters where, other than that we could easily grep for something like:
here is my cheatsheet if it helps:
|
Hello. Sorry that this hasn't been done sooner and I am trying to get through this now. I am looking for where the Meg-DS startup script/process. Can you point to me which script/process initiates the framework init? |
We have already started sorting it out here: #204 (as a side effect of another need). |
We need to have a diagnostic model size dumped during the framework init. We currently get a report per rank and not the total.
Later on ZeRO engine does dump the right thing amongst multiple other numbers and repeated on each rank
But ideally we just want a print like:
Just on rank 0.
Thanks.
The text was updated successfully, but these errors were encountered: