Skip to content

Can the learning rate find by dp using one more gpu be used in ddp? #4878

Discussion options

You must be logged in to vote

I think the answer is no.

DP doesn't change your effective batch size while DDP does (in your case with one node and 4GPUs, the effective batch size is 4 times bigger with DDP). You can find more info about the effective batch size in the "multi-GPU" section of Lightning's documentation here.

As a consequence of this, you should probably increase your learning rate. Rule of thumbs is to increase it linearly (so by 4) but there is more than just doing that. Have a look at that paper: https://arxiv.org/pdf/1706.02677.pdf

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by Borda
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #4878 on December 23, 2020 20:08.