DDP freez becasue of torchsummary.summary #20599
Unanswered
mehran66
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
sharing some learning with Distributed Data Parallel (DDP). my training code was freezing at some point during the training loop and took me so much time to find the issue; at somepoint before the training loop, I was using torchsummary.summary(model, (3, input_size, input_size)) to print the model summary for my log. this function does not support ddp and creates a deadlock.
Beta Was this translation helpful? Give feedback.
All reactions