-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parallel.do with batch norm #8186
Fix parallel.do with batch norm #8186
Conversation
paddle/operators/parallel_do_op.cc
Outdated
@@ -248,6 +248,8 @@ class ParallelDoGradOp : public framework::OperatorBase { | |||
const std::vector<framework::Scope *> &sub_scopes, | |||
const platform::PlaceList &places) const { | |||
for (auto &s : Outputs(framework::GradVarName(kParameters))) { | |||
VLOG(10) << "Accumulating " << s; | |||
if (s == framework::kEmptyVarName) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not accumulate @EMPTY@
PADDLE_ENFORCE(ctx->HasOutputs(framework::GradVarName(kParameters))); | ||
ctx->SetOutputsDim(framework::GradVarName(kParameters), | ||
ctx->GetInputsDim(kParameters)); | ||
auto p_dims = ctx->GetInputsDim(kParameters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If parameter gradient is empty, do not infer shape.
@@ -274,21 +274,20 @@ def get_parameters(self): | |||
parent_block = self.parent_block() | |||
|
|||
local_inputs = set() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous logic cannot calculate parameters that used and updated by the same operator.
For the purpose of this PR, please ignore the nccl error: Update: it turns out the driver version of 199 is not enough. @helinwang is updating it. |
… feature/parallel_do_and_batch_norm
Merged by #8249 |
Related issue #8153