Can the learning rate find by dp using one more gpu be used in ddp? #4878
-
When using pytorch_lightning.tuner.lr_finder.lr_find, ddp have some error. So i change to dp using 4 gpus. Can the learning rate find by dp used by ddp? They have same gpu numbers. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I think the answer is no. DP doesn't change your effective batch size while DDP does (in your case with one node and 4GPUs, the effective batch size is 4 times bigger with DDP). You can find more info about the effective batch size in the "multi-GPU" section of Lightning's documentation here. As a consequence of this, you should probably increase your learning rate. Rule of thumbs is to increase it linearly (so by 4) but there is more than just doing that. Have a look at that paper: https://arxiv.org/pdf/1706.02677.pdf |
Beta Was this translation helpful? Give feedback.
-
OK,thanks a lot. |
Beta Was this translation helpful? Give feedback.
I think the answer is no.
DP doesn't change your effective batch size while DDP does (in your case with one node and 4GPUs, the effective batch size is 4 times bigger with DDP). You can find more info about the effective batch size in the "multi-GPU" section of Lightning's documentation here.
As a consequence of this, you should probably increase your learning rate. Rule of thumbs is to increase it linearly (so by 4) but there is more than just doing that. Have a look at that paper: https://arxiv.org/pdf/1706.02677.pdf