Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to log TFLOPs #157

Closed
stas00 opened this issue Oct 25, 2021 · 2 comments · Fixed by #210
Closed

Need to log TFLOPs #157

stas00 opened this issue Oct 25, 2021 · 2 comments · Fixed by #210
Assignees
Labels
Good First Issue Good for newcomers

Comments

@stas00
Copy link
Contributor

stas00 commented Oct 25, 2021

We need to log TFLOPS to TB and logs.

The code is already in https://github.com/microsoft/DeepSpeedExamples/

we just need to backport it to our repo. Here is what's needed:

./Megatron-LM-v1.1.5-ZeRO3/megatron/training.py:from megatron.utils import report_memory, flops_calculator
./Megatron-LM-v1.1.5-ZeRO3/megatron/training.py:        flops_calculator(model, args, elapsed_time)
./Megatron-LM-v1.1.5-ZeRO3/megatron/utils.py:def flops_calculator(model, args, iteration_time):
./Megatron-LM-v1.1.5-ZeRO3/megatron/utils.py:    giga_flops_per_model_per_train_step = approx_parameters_in_billions * args.batch_size * args.seq_length * 2.0 * 4.0
./Megatron-LM-v1.1.5-ZeRO3/megatron/utils.py:    effective_tera_flops_per_gpu = giga_flops_per_model_per_train_step / (iteration_time * 1000.0 * gpus_per_model)
./Megatron-LM-v1.1.5-ZeRO3/megatron/utils.py:    print_rank_0(f"Effective Tera Flops per GPU: {round(effective_tera_flops_per_gpu, 2)} and total parameters {round(approx_parameters_in_billions, 3)} B")
@stas00 stas00 added the Good First Issue Good for newcomers label Oct 25, 2021
@jaketae jaketae self-assigned this Nov 23, 2021
@jaketae
Copy link
Member

jaketae commented Nov 24, 2021

I will be looking at this issue once #204 is sorted!

@stas00
Copy link
Contributor Author

stas00 commented Nov 27, 2021

@jaketae, I was working on a different set of graphs and TFLOPs was just a multiple of a graph I added, so this Issue will be closed once #210 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants