Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Eager] Optimize Grad by prune useless branch #47827

Merged
merged 48 commits into from
Dec 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
79dba98
[Eager] Fix paddle.grad interface
veyron95 May 24, 2022
0bd885a
[Eager] Support minimum SubGraph for GeneralGrad
veyron95 May 24, 2022
8964f2c
Add needed_nodes to prune grad graph more thoroughly
veyron95 May 24, 2022
b75d5a0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 May 29, 2022
b2ea9e3
[Eager] Add grad_node_trans_mapping_ to record which grad_node has be…
veyron95 May 29, 2022
ba7257a
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Jun 10, 2022
f7fe8cb
[Eager] Fix paddle.grad interface
veyron95 Jun 10, 2022
38e9f15
Merge commit 'refs/pull/42964/head' of https://github.com/PaddlePaddl…
veyron95 Jun 10, 2022
26a3b72
Polish code
veyron95 Jun 11, 2022
17b46fb
remove potential_stop_node
veyron95 Jun 11, 2022
c60a78b
Add endding_nodes to enhance genSugraph logic
veyron95 Jun 13, 2022
2fd219f
clear endding_nodes_
veyron95 Jun 13, 2022
f56b4a2
polish code
veyron95 Jun 13, 2022
0d802b2
rename endding_nodes to endding_nades_
veyron95 Jun 13, 2022
06b9614
Refactor grad interface
veyron95 Jun 15, 2022
f5beac0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Jun 15, 2022
bd9e9f9
Add register_hook case to fix coverage-ci
veyron95 Jun 16, 2022
1370fe3
Merge branch 'PaddlePaddle:develop' into gen_subgraph_for_grad
veyron95 Jun 21, 2022
6e7e5da
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Jun 22, 2022
08221c1
Fix code format
veyron95 Jun 22, 2022
34f3ae1
Refactor general_grad
veyron95 Jun 23, 2022
a1c0f92
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Jun 23, 2022
dc5cdea
Merge branch 'gen_subgraph_for_grad' of github.com:veyron95/Paddle in…
veyron95 Jun 23, 2022
bbcbfe0
Add more code comments
veyron95 Jun 24, 2022
90ea5ba
call clear directly to release GradSlotMeta
veyron95 Jun 25, 2022
3eb2e92
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Jul 7, 2022
e6060db
fix a mistake
veyron95 Jul 8, 2022
37436f7
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Jul 11, 2022
e798216
fix matmul/ multiply kernel logic and optional input in yaml, fill ze…
veyron95 Jul 13, 2022
a367ebc
fix batch_norm_double_grad yaml optional config
veyron95 Jul 16, 2022
dc747f8
fix tanh_triple_grad yaml and kernels
veyron95 Jul 18, 2022
d7bea87
fix MultiplyTripleGradKernel optional logic
veyron95 Jul 18, 2022
c7a2150
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Aug 26, 2022
3b81a37
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
veyron95 Aug 28, 2022
824ece9
fix merge mistake
veyron95 Aug 28, 2022
fcfa195
merge develop
JiabinYang Nov 9, 2022
b7def24
fix compile error
JiabinYang Nov 10, 2022
7ff7af5
remove legacy attr for bn
JiabinYang Nov 10, 2022
a0f4cb2
polish code
JiabinYang Nov 24, 2022
a8e4e80
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
JiabinYang Nov 24, 2022
56eb62c
fix some kernel
JiabinYang Nov 25, 2022
fc208dc
merge develop
JiabinYang Nov 28, 2022
8d582c8
fix error
JiabinYang Nov 28, 2022
801a930
remote log
JiabinYang Nov 28, 2022
0477634
fix kernel with full like
JiabinYang Nov 29, 2022
e3ba0f5
hide value log behind
JiabinYang Nov 29, 2022
6b5b669
hide value log behind
JiabinYang Nov 29, 2022
2bf04be
fix matmul_triple grad
JiabinYang Nov 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
"multiply_triple_grad",
"conv2d_grad_grad",
"batch_norm_double_grad",
"tanh_grad",
"tanh_double_grad",
"tanh_triple_grad",
"sin_double_grad",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ class {} : public egr::GradNodeBase {{

AFTER_LOG_PRINT_TEMPLATE = """
if(VLOG_IS_ON(4)){{
const char* INPUT_PRINT_TEMPLATE = \"{{ Input: [%s], Output: [%s] }} \";
const char* INPUT_PRINT_TEMPLATE = \"{{ Input: [%s], \\n Output: [%s] }} \";
{}
VLOG(4) << paddle::string::Sprintf(INPUT_PRINT_TEMPLATE, input_str, output_str);
}}
Expand Down
28 changes: 9 additions & 19 deletions paddle/fluid/eager/backward.cc
Original file line number Diff line number Diff line change
Expand Up @@ -173,9 +173,10 @@ std::vector<paddle::experimental::Tensor> RunBackward(
node_input_buffers_dict[grad_node] =
std::make_unique<GradTensorHolder>(grad_node->InputMeta());
}
bool copy_from_grad_t =
grad_tensors.size() > 0 && grad_tensors[i].initialized();
if (copy_from_grad_t) {

// copy grad tensor since we should totally run grad without affect forward
// value
if (grad_tensors.size() > 0 && grad_tensors[i].initialized()) {
PADDLE_ENFORCE(
grad_tensors.size() == tensors.size(),
paddle::platform::errors::Fatal(
Expand Down Expand Up @@ -357,22 +358,11 @@ std::vector<paddle::experimental::Tensor> RunBackward(
"Node's in-degree cannot be negative.",
next_node->name()));

if (is_general_grad) {
if (node_in_degree_map[next_node] == 0 &&
GeneralGrad::Instance().IsNeededNodes(next_node)) {
if (dynamic_cast<egr::GradNodeAccumulation*>(next_node)) {
queue.push_front(std::move(next_node));
} else {
queue.push_back(std::move(next_node));
}
}
} else {
if (node_in_degree_map[next_node] == 0) {
if (dynamic_cast<egr::GradNodeAccumulation*>(next_node)) {
queue.push_front(std::move(next_node));
} else {
queue.push_back(std::move(next_node));
}
if (node_in_degree_map[next_node] == 0) {
if (dynamic_cast<egr::GradNodeAccumulation*>(next_node)) {
queue.push_front(std::move(next_node));
} else {
queue.push_back(std::move(next_node));
}
}
}
Expand Down
49 changes: 33 additions & 16 deletions paddle/fluid/eager/general_grad.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ class GeneralGrad {
for (size_t i = 0; i < num_inputs; i++) {
AutogradMeta* auto_grad_meta =
EagerUtils::unsafe_autograd_meta(inputs[i]);
PADDLE_ENFORCE_NOT_NULL(
auto_grad_meta,
paddle::platform::errors::Fatal(
"We got %s:[%d] 's autograd meta is NULL.", msg, i));
auto* target_node = auto_grad_meta->GetMutableGradNode().get();

if (orig_to_copied_node_map_.count(target_node)) {
Expand Down Expand Up @@ -82,10 +86,13 @@ class GeneralGrad {
// input_target_nodes
void PurifyPotentialStartUpNodes() {
VLOG(6) << "Running in PurifyPotentialStartUpNodes";
if (input_target_nodes_inputmeta_map_.empty()) return;
if (input_target_nodes_inputmeta_map_.empty()) {
VLOG(6) << "No input target nodes found, skip.";
return;
}
std::unordered_set<GradNodeBase*> potential_startup_nodes_to_be_erased;
for (auto startup_op : potential_startup_nodes_) {
auto iter = input_target_nodes_inputmeta_map_.find(startup_op);
for (auto startup_node : potential_startup_nodes_) {
auto iter = input_target_nodes_inputmeta_map_.find(startup_node);
if (iter != input_target_nodes_inputmeta_map_.end()) {
potential_startup_nodes_to_be_erased.emplace(iter->first);
}
Expand Down Expand Up @@ -157,11 +164,11 @@ class GeneralGrad {
potential_startup_nodes_.erase(node);
}
}
}
} // TODO(jiabin): May we need some check here.
}

// Get Graph Info Betweent input target GradNode and outputs,
// record depending_nodes_, potential_startup_nodes_
// record depending_nodes_
void GetGraphInfoBetweenTargets(const std::deque<GradNodeBase*>& init_queue) {
VLOG(6) << "Runing In GetGraphInfoBetweenTargets";

Expand Down Expand Up @@ -227,7 +234,7 @@ class GeneralGrad {
std::make_shared<paddle::experimental::Tensor>(target_result);
}
}
}
} // TODO(jiabin): Some check here.
}

void SetResultForEnddingNodes(
Expand Down Expand Up @@ -319,21 +326,22 @@ class GeneralGrad {
void SetNodeToAccumulationNode(GradNodeBase* node) {
if (dynamic_cast<egr::GradNodeAccumulation*>(node)) return;
if (!(depending_nodes_)[node].empty()) {
// Find precedding_nodes of current node.
auto precedding_nodes = (depending_nodes_)[node];
for (auto pre_nodes : precedding_nodes) {
paddle::small_vector<std::vector<GradSlotMeta>, kSlotSmallVectorSize>&
pre_nodes_edges = pre_nodes->MutableOutputMeta();
for (size_t i = 0; i < pre_nodes_edges.size(); i++) {
for (size_t j = 0; j < pre_nodes_edges[i].size(); j++) {
auto edge_ = pre_nodes_edges[i][j].GetEdge();
const auto& edge_ = pre_nodes_edges[i][j].GetEdge();
if (edge_.GetGradNode() == node) {
auto autograd_meta = egr::AutogradMeta(edge_);
Edge& pre_node_edge = pre_nodes_edges[i][j].GetMutableEdge();

if (copied_node_to_endding_node_map_.count(node)) {
pre_node_edge.SetGradNode(
copied_node_to_endding_node_map_[node]);
} else {
auto autograd_meta = egr::AutogradMeta(edge_);
std::shared_ptr<GradNodeBase> shared_grad_node_accumulation =
std::make_shared<egr::GradNodeAccumulation>(&autograd_meta);
pre_node_edge.SetGradNode(shared_grad_node_accumulation);
Expand Down Expand Up @@ -361,7 +369,7 @@ class GeneralGrad {
grad_node->SetGradientHookFuntions(
node->GetGradientHookFuntions());
}
}
} // or this node has no need to change
}
}
}
Expand All @@ -381,11 +389,9 @@ class GeneralGrad {
}
visited.insert(node);

if (IsInputTargetNodes(node)) {
if (IsEnddingNodes(node)) {
SetNodeToAccumulationNode(node);
continue;
}
if (IsInputTargetNodes(node) && IsEnddingNodes(node)) {
SetNodeToAccumulationNode(node);
continue;
}

paddle::small_vector<std::vector<GradSlotMeta>, kSlotSmallVectorSize>&
Expand All @@ -411,7 +417,17 @@ class GeneralGrad {
continue;
}

// TODO(weilong): support prune logic deeper
if (meta.size() != 1 && IsNeededNodes(node) &&
!IsNeededNodes(next_node.get()) && !IsEnddingNodes(node)) {
VLOG(3) << "Get stop edge from grad_node: " << node->name() << " : "
<< node << " to:" << next_node->name() << ", "
<< next_node.get() << " with output rank info: " << i
<< ", " << j;
// No need to compute grad from needed Nodes to no need Nodes
meta[i][j].SetStopGradient(true);
edge.Clear();
continue;
}

// Update BFS queue
queue_.push_back(next_node.get());
Expand Down Expand Up @@ -502,7 +518,8 @@ class GeneralGrad {
// Save node and update mapping
orig_to_copied_node_map_[orig_node.get()] = copied_node;
copied_grad_nodes_.push_back(copied_node);

VLOG(3) << "Copied Node: " << orig_node->name() << " ptr: " << orig_node
<< " to ptr: " << copied_node;
return copied_node.get();
}

Expand Down
5 changes: 5 additions & 0 deletions paddle/fluid/eager/grad_tensor_holder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,11 @@ void GradTensorHolder::add(size_t slot_id,
size_t rank,
const paddle::experimental::Tensor& t,
bool create_graph) {
if (!t.initialized()) {
VLOG(3) << "No need to do accumulate for uninitialized t.";
return;
} // TODO(jiabin): Remove this when we fix all kernel.

PADDLE_ENFORCE(slot_id < buffer_.size(),
paddle::platform::errors::Fatal(
"Invalid slot_id for GradTensorHolder::add() "
Expand Down
53 changes: 52 additions & 1 deletion paddle/fluid/eager/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,58 @@ class EagerUtils {
} else {
tensor_info_str += "Unknown";
}
if (VLOG_IS_ON(6)) {
if (VLOG_IS_ON(11)) {
const char* TENSOR_PRINT_TEMPLATE =
"{Name: %s, Initialized: %d, Ptr: %d "
"TensorInfo: [ %s ], Value:[ %s ], ADInfo:[ %s ]}";
auto* ad_meta = nullable_autograd_meta(t);
if (ad_meta && (ad_meta->WeakGrad().lock().get())) {
std::string ad_info_str = "";
const char* AD_INFO_TEMPLATE =
"Grad: [ %s ], GradNode: [ %s ], StopGradient: [ %d ]";
ad_info_str += paddle::string::Sprintf(AD_INFO_TEMPLATE,
TensorStr(ad_meta->Grad()),
GradNodeStr(t),
ad_meta->StopGradient());
auto* data_ptr = dynamic_cast<phi::DenseTensor*>(t.impl().get());
if (t.is_initialized() && data_ptr) {
return paddle::string::Sprintf(TENSOR_PRINT_TEMPLATE,
tensor_name_str,
t.initialized(),
t.impl(),
tensor_info_str,
*data_ptr,
ad_info_str);
} else {
return paddle::string::Sprintf(TENSOR_PRINT_TEMPLATE,
tensor_name_str,
t.initialized(),
t.impl(),
tensor_info_str,
"None",
ad_info_str);
}
} else {
auto* data_ptr = dynamic_cast<phi::DenseTensor*>(t.impl().get());
if (t.is_initialized() && data_ptr) {
return paddle::string::Sprintf(TENSOR_PRINT_TEMPLATE,
tensor_name_str,
t.initialized(),
t.impl(),
tensor_info_str,
*data_ptr,
"None");
} else {
return paddle::string::Sprintf(TENSOR_PRINT_TEMPLATE,
tensor_name_str,
t.initialized(),
t.impl(),
tensor_info_str,
"None",
"None");
}
}
} else if (VLOG_IS_ON(6)) {
const char* TENSOR_PRINT_TEMPLATE =
"{Name: %s, Initialized: %d, Ptr: %d "
"TensorInfo: [ %s ], ADInfo:[ %s ]}";
Expand Down
5 changes: 5 additions & 0 deletions paddle/phi/api/yaml/backward.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@
param : [x, x]
kernel :
func : cos_double_grad
optional: grad_out
backward : cos_triple_grad
inplace : (grad_x_grad -> grad_out_grad)

Expand All @@ -211,6 +212,7 @@
param : [x, x, grad_x_grad_forward]
kernel :
func : cos_triple_grad
optional: grad_out_forward, grad_x_grad_forward, grad_out_grad_grad
inplace : (grad_x_grad_forward -> grad_out_forward_grad)

- backward_op : cosh_grad
Expand Down Expand Up @@ -797,6 +799,7 @@
param : [x, x]
kernel :
func : sin_double_grad
optional: grad_out
backward : sin_triple_grad
inplace : (grad_x_grad -> grad_out_grad)

Expand All @@ -821,6 +824,7 @@
param : [x, x, grad_x_grad_forward]
kernel :
func : sin_triple_grad
optional: grad_out_forward, grad_x_grad_forward, grad_out_grad_grad
inplace : (grad_x_grad_forward -> grad_out_forward_grad)

- backward_op : sinh_grad
Expand Down Expand Up @@ -979,6 +983,7 @@
kernel :
func : tanh_triple_grad
inplace : (grad_x_grad_forward -> grad_out_forward_grad)
optional : grad_out_new_grad, grad_out_grad_grad

- backward_op : thresholded_relu_grad
forward : thresholded_relu (Tensor x, float threshold) -> Tensor(out)
Expand Down
8 changes: 4 additions & 4 deletions paddle/phi/api/yaml/legacy_backward.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@
kernel :
func : batch_norm_grad_grad
data_type : x
optional : out_mean, out_variance
optional : out_mean, out_variance, grad_x_grad, grad_scale_grad, grad_bias_grad
inplace : (grad_out -> grad_out_grad)

- backward_op : batch_norm_grad
Expand Down Expand Up @@ -903,7 +903,7 @@
param : [x, y, fwd_grad_out, fwd_grad_grad_x, fwd_grad_grad_y]
kernel :
func : matmul_triple_grad
optional : grad_x_grad, grad_y_grad, grad_grad_out_grad
optional : fwd_grad_grad_x, fwd_grad_grad_y, grad_x_grad, grad_y_grad, grad_grad_out_grad

- backward_op : matrix_power_grad
forward : matrix_power (Tensor x, int n) -> Tensor(out)
Expand Down Expand Up @@ -1091,10 +1091,10 @@
output : Tensor(x_grad), Tensor(y_grad), Tensor(fwd_grad_out_grad), Tensor(fwd_grad_grad_x_grad), Tensor(fwd_grad_grad_y_grad)
infer_meta :
func : GeneralQuinaryGradInferMeta
param : [x, y, fwd_grad_out, x, y]
param : [x, y, fwd_grad_out, fwd_grad_grad_x, fwd_grad_grad_y]
kernel :
func : multiply_triple_grad
optional : fwd_grad_grad_x, fwd_grad_grad_y, grad_grad_out_grad
optional : fwd_grad_grad_x, fwd_grad_grad_y, grad_x_grad, grad_y_grad, grad_grad_out_grad

- backward_op : nearest_interp_grad
forward : nearest_interp (Tensor x, Tensor out_size, Tensor[] size_tensor, Tensor scale_tensor, str data_layout, int out_d, int out_h, int out_w, float[] scale, str interp_method, bool align_corners, int align_mode) -> Tensor(output)
Expand Down
20 changes: 10 additions & 10 deletions paddle/phi/kernels/activation_grad_kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,15 +83,15 @@ void ReluDoubleGradKernel(const Context& dev_ctx,
template <typename T, typename Context>
void SinDoubleGradKernel(const Context& dev_ctx,
const DenseTensor& x,
const DenseTensor& dout,
const paddle::optional<DenseTensor>& dout,
const DenseTensor& ddx,
DenseTensor* dx,
DenseTensor* ddout);

template <typename T, typename Context>
void CosDoubleGradKernel(const Context& dev_ctx,
const DenseTensor& x,
const DenseTensor& dout,
const paddle::optional<DenseTensor>& dout,
const DenseTensor& ddx,
DenseTensor* dx,
DenseTensor* ddout);
Expand All @@ -109,30 +109,30 @@ void TanhTripleGradKernel(const Context& dev_ctx,
const DenseTensor& out,
const DenseTensor& dout,
const DenseTensor& ddx,
const DenseTensor& d_dout_new,
const DenseTensor& d_ddout,
const paddle::optional<DenseTensor>& d_dout_new,
const paddle::optional<DenseTensor>& d_ddout,
DenseTensor* d_out_new,
DenseTensor* d_dout,
DenseTensor* d_ddx);

template <typename T, typename Context>
void SinTripleGradKernel(const Context& dev_ctx,
const DenseTensor& x,
const DenseTensor& dout,
const DenseTensor& ddx,
const paddle::optional<DenseTensor>& dout,
const paddle::optional<DenseTensor>& ddx,
const DenseTensor& d_dx_new,
const DenseTensor& d_ddout,
const paddle::optional<DenseTensor>& d_ddout,
DenseTensor* d_x_new,
DenseTensor* d_dout,
DenseTensor* d_ddx);

template <typename T, typename Context>
void CosTripleGradKernel(const Context& dev_ctx,
const DenseTensor& x,
const DenseTensor& dout,
const DenseTensor& ddx,
const paddle::optional<DenseTensor>& dout,
const paddle::optional<DenseTensor>& ddx,
const DenseTensor& d_dx_new,
const DenseTensor& d_ddout,
const paddle::optional<DenseTensor>& d_ddout,
DenseTensor* d_x_new,
DenseTensor* d_dout,
DenseTensor* d_ddx);
Expand Down
Loading