Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory usage on MATH #80

Closed
lewtun opened this issue Mar 2, 2024 · 3 comments · Fixed by #83
Closed

Large memory usage on MATH #80

lewtun opened this issue Mar 2, 2024 · 3 comments · Fixed by #83
Assignees
Labels
bug Something isn't working

Comments

@lewtun
Copy link
Member

lewtun commented Mar 2, 2024

Is the MATH benchmark expected to run for anything beyond batch_size=1?

Running the following command for a small model gives OOM on a single node of H100s which is a bit surprising to me:

accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
    --tasks="lighteval|math:algebra|5|0" \
    --output_dir "./scratch/evals" \
    --model_args "pretrained=Qwen/Qwen1.5-0.5B" \
    --override_batch_size 2

Strangely enough, bumping up the batch size for Mistral 7B is fine:

accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
    --tasks="lighteval|math:algebra|5|0" \
    --output_dir "./scratch/evals" \
    --model_args "pretrained=mistralai/Mistral-7B-v0.1" \
    --override_batch_size 2

Perhaps there's some sort of unbounded generation occurring which is causing the memory to explode for certain models like Qwen?

@clefourrier
Copy link
Member

Hi,
Thanks for the issue!

I can confirm that the generation size is unbounded, which you can see in the task description

{"name":"math:algebra","suite":["lighteval","math"],"prompt_function":"math","hf_repo":"lighteval\/MATH","hf_subset":"algebra","hf_avail_splits":["train","test","validation"],"evaluation_splits":["test"],"few_shots_split":null,"few_shots_select":null,"generation_size":null,"metric":["quasi_exact_match_math"],"stop_sequence":["\n"],"output_regex":null,"frozen":false}

When generation_size is null, there is no bound expect the model's max context length (should be around 8K for both these models though).

I'll check if the paper defines a maximum expected generation size, else will fix the bound to the maximum answer size + 10% maybe?

@lewtun
Copy link
Member Author

lewtun commented Mar 2, 2024

I'll check if the paper defines a maximum expected generation size, else will fix the bound to the maximum answer size + 10% maybe?

Yes, alternatively we could set the max gen size to something like 1024 or 2048 tokens since if a model cannot answer in that span then it is likely incorrect. You can see here that the authors chose 1024 tokens for models that aren't gpt2-xl, so 2048 seems like a safe bet

@clefourrier
Copy link
Member

Sounds perfect, will use this rn!

clefourrier added a commit that referenced this issue Mar 4, 2024
@clefourrier clefourrier self-assigned this Mar 4, 2024
@NathanHB NathanHB added the bug Something isn't working label Mar 4, 2024
clefourrier added a commit that referenced this issue Mar 4, 2024
Caps it at 2048 even for models with a much longer context size. Should fix #80
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants