Skip to content

Commit

Permalink
tune down batch-size for res2net to avoid OOM (#122977)
Browse files Browse the repository at this point in the history
Summary:
The batch-size for this model is 64 previously. Later on we change that to 256 and cause OOM in cudagraphs setting. This PR tune the batch size down to 128.

Share more logs from my local run
```
cuda,res2net101_26w_4s,128,1.603578,110.273572,335.263494,1.042566,11.469964,11.001666,807,2,7,6,0,0
cuda,res2net101_26w_4s,256,1.714980,207.986155,344.013071,1.058278,22.260176,21.034332,807,2,7,6,0,0
```

The log shows that torch.compile uses 11GB for 128 batch size and 21GB for 256 batch size. I guess the benchmark script has extra overhead cause the model OOM for 256 batch size in the dashboard run.

X-link: pytorch/pytorch#122977
Approved by: https://github.com/Chillee

Reviewed By: atalman

Differential Revision: D55561255

Pulled By: shunting314

fbshipit-source-id: 9863e86776d8ed30397806bda330f53c9815f61e
  • Loading branch information
shunting314 authored and facebook-github-bot committed Apr 1, 2024
1 parent 756ea35 commit 95f31c1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion userbenchmark/dynamo/dynamobench/timm_models_list.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ pnasnet5large 32
poolformer_m36 128
regnety_002 1024
repvgg_a2 128
res2net101_26w_4s 256
res2net101_26w_4s 128
res2net50_14w_8s 128
res2next50 128
resmlp_12_224 128
Expand Down

0 comments on commit 95f31c1

Please sign in to comment.