[Semi-Auto] LayerNorm Parallel Rule #55130

zhiqiu · 2023-07-04T10:15:06Z

PR types

New features

PR changes

OPs

Description

Pcard-70448

add spmd rule for layer_norm

paddle-bot · 2023-07-04T10:15:10Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JZ-LIANG · 2023-07-05T03:06:20Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

+std::pair<std::vector<TensorDistAttr>, std::vector<TensorDistAttr>>
+LayerNormSPMDRule::InferForward(const std::vector<DistTensorSpec>& input_specs,
+                                const paddle::framework::AttributeMap& attrs) {
+  // step0: verify input args based on matmul logic


JZ-LIANG · 2023-07-05T07:10:19Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

+  }
+  std::string out_axes = x_axes;
+
+  VLOG(4) << "LayerNormSPMDRule build Einsum notation (x,scale,bias->out): ["


should include all layer_norm outputs (x,scale,bias->out, mean, var)

JZ-LIANG · 2023-07-05T07:38:46Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

+  input_dist_attrs.emplace_back(ReplicatedOnMesh(input_specs[2].dist_attr()));
+
+  // Step2.4.  handle input and out tensor partial
+  std::vector<int64_t> partial_on_dims;


LayerNorm activation output would not be partial

JZ-LIANG · 2023-07-05T07:39:57Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

+          << "]; out dims_mapping: [" << str_join(out_dims_mapping)
+          << "], partial_on_dims: [" << str_join(partial_on_dims) << "]";
+
+  return {input_dist_attrs, {output_dist_attr_dst}};


should infer the distattr of output （variance and mean）

JZ-LIANG · 2023-07-05T07:40:31Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.h

+namespace distributed {
+namespace auto_parallel {
+
+TensorDistAttr GetInferedDistAttr(


remove this

JZ-LIANG

LGTM

JZ-LIANG · 2023-07-06T03:28:21Z

paddle/fluid/distributed/auto_parallel/test/spmd_rule_test.cc

+
+  SPMDRuleBase* layer_norm_rule = SPMDRuleMap::Instance().Get("layer_norm");
+
+  // ijk[1, -1, -1],k[-1],k[-1] --> ijk[1, -1, -1] partial[1]


I will fix it in the next pr.

JZ-LIANG · 2023-07-06T03:42:17Z

paddle/fluid/distributed/auto_parallel/test/spmd_rule_test.cc

+  VLOG(4) << "test1 done.";
+
+  // ijk[1, 0, -1],k[0],k[0] --> ijk[1, 0, -1]
+  x_dist_tensor_spec.set_dims_mapping({1, 0, -1});


GOOD test case !
there are two kind of error in the case:

multiple batch axes (i,j) are sharded in input activation, which is not supported by now.

same mesh dimension ( 0 ) is sharding two different tensor axes (j & k).

it inspires me that we are missing one important precondition checking in InferForward (when support sharding on bias in future):
when [ijkl,y(kl),y(kl)->ijkl,x(ij),x(ij) (x,scale,bias->out,mean,variance, begin_norm_axis=2, x=ij, y=kl)]
thought "y" and "kl" are represented by different char, but they are same tensor axis and should be sharded by same mesh dim.

JZ-LIANG · 2023-07-06T03:45:00Z

paddle/fluid/distributed/auto_parallel/test/spmd_rule_test.cc

+          attrs););
+  VLOG(4) << "test2 done.";
+
+  // ijk[0, -1, -1],z[-1],z[1] --> ijk[0, 1, -1, -1], z=jk


ijk[0, -1, -1], k[-1], k[1] --> ijk[0, -1, -1], x[0], x[0], x=ij

I will fix it in the next pr.

* add layernorm spmd rule * add ut * follow comments

zhiqiu added 2 commits July 4, 2023 16:28

add layernorm spmd rule

e1dcfa3

add ut

31e36cf

zhiqiu requested a review from JZ-LIANG July 5, 2023 06:55

JZ-LIANG reviewed Jul 5, 2023

View reviewed changes

follow comments

8756a94

JZ-LIANG approved these changes Jul 6, 2023

View reviewed changes

zhiqiu merged commit 4d1b9f0 into PaddlePaddle:develop Jul 6, 2023

cqulilujia pushed a commit to cqulilujia/Paddle that referenced this pull request Jul 24, 2023

[Semi-Auto] LayerNorm Parallel Rule (PaddlePaddle#55130)

eb0827a

* add layernorm spmd rule * add ut * follow comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Semi-Auto] LayerNorm Parallel Rule #55130

[Semi-Auto] LayerNorm Parallel Rule #55130

zhiqiu commented Jul 4, 2023

paddle-bot bot commented Jul 4, 2023

JZ-LIANG Jul 5, 2023

zhiqiu Jul 5, 2023

JZ-LIANG Jul 5, 2023

zhiqiu Jul 5, 2023

JZ-LIANG Jul 5, 2023

zhiqiu Jul 5, 2023

JZ-LIANG Jul 5, 2023

zhiqiu Jul 5, 2023

JZ-LIANG Jul 5, 2023

zhiqiu Jul 5, 2023

JZ-LIANG left a comment

JZ-LIANG Jul 6, 2023

zhiqiu Jul 6, 2023

JZ-LIANG Jul 6, 2023 •

edited

Loading

JZ-LIANG Jul 6, 2023

zhiqiu Jul 6, 2023


		SPMDRuleBase* layer_norm_rule = SPMDRuleMap::Instance().Get("layer_norm");

		// ijk[1, -1, -1],k[-1],k[-1] --> ijk[1, -1, -1] partial[1]

[Semi-Auto] LayerNorm Parallel Rule #55130

[Semi-Auto] LayerNorm Parallel Rule #55130

Conversation

zhiqiu commented Jul 4, 2023

PR types

PR changes

Description

paddle-bot bot commented Jul 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG Jul 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG Jul 6, 2023 •

edited

Loading