[AutoParallel] Custom op support auto parallel #58553

wanghuancoder · 2023-11-01T04:39:19Z

PR types

Others

PR changes

Others

Description

自定义算子支持AutoParallel
Pcard-73145

… custom_op_support_auto_parallel

GhostScreaming

LGTM

jiahy0825 · 2023-11-03T08:55:36Z

test/auto_parallel/semi_auto_parallel_simple_net_custom_relu.py

+        'dist_custom_relu_op.cc',
+        'dist_custom_relu_op_dup.cc',
+        'dist_custom_relu_op.cu',


这几个 kernel 文件和原来的实现有区别吗？是不是可以复用原来的 kernel 代码，应该可以通过相对路径找到

好的，我试试~感谢！

我试了，可以。感谢！！

jiahy0825 · 2023-11-03T08:58:55Z

paddle/phi/api/ext/op_meta_info.h

-  const std::vector<std::pair<size_t, size_t>>& InputRange();
-  const std::vector<std::pair<size_t, size_t>>& OutputRange();
+  const std::vector<std::pair<size_t, size_t>>& InputRange() const;
+  const std::vector<std::pair<size_t, size_t>>& OutputRange() const;


感谢欢哥对这里的 const 保障

jiahy0825 · 2023-11-03T09:11:22Z

paddle/fluid/eager/custom_operator/custom_operator_utils.cc

+
+  std::vector<Tensor>* all_inputs = ctx.AllMutableInput();
+
+#ifdef PADDLE_WITH_DISTRIBUTE


最好把 #ifdef PADDLE_WITH_DISTRIBUTE 里的代码独立出来一个函数

啊，这个是为了和PHI api.cc里的格式保持一致~

感觉还是有必要独立出来，此处分布式和自定义算子的耦合太重了，如果后续需要调试，预计这里的维护成本会很高

jiahy0825 · 2023-11-03T09:14:06Z

paddle/fluid/eager/custom_operator/custom_operator_utils.cc

+void run_custom_op_impl(paddle::OpMetaInfo op_info,
+                        bool is_forward,
+                        bool is_double_grad,
+                        paddle::CustomOpKernelContext& ctx) {  // NOLINT


对于写入的变量，在自定义算子的体系里，要改成指针。
paddle::CustomOpKernelContext* ctx
paddle::OpMetaInfo op_info 也做了写入吧？也要改成指针形式

我把paddle::OpMetaInfo op_info改成const paddle::OpMetaInfo& op_info吧~
ctx在run_custom_op_impl还要做修改，但指针我觉得大量使用不方便。所以加了NOLINT。

jiahy0825 · 2023-11-06T05:00:06Z

paddle/fluid/eager/custom_operator/custom_operator_utils.cc

+        } else {
+          for (size_t j = pair.first; j < pair.second; j++) {
+            *(ctx.MutableOutputAt(j)) = BuildEmptyDistPaddleTensor(
+                current_process_mesh, out_dim[0], out_dtype[0]);


这里下标是不是有问题，应该修改为 j ?

感谢感谢！不然是个大坑！

jiahy0825 · 2023-11-06T05:14:30Z

paddle/fluid/eager/custom_operator/custom_operator_utils.cc

+
+  std::vector<Tensor>* all_inputs = ctx.AllMutableInput();
+
+#ifdef PADDLE_WITH_DISTRIBUTE


感觉还是有必要独立出来，此处分布式和自定义算子的耦合太重了，如果后续需要调试，预计这里的维护成本会很高

jiahy0825

LGTM~ Great work!

… custom_op_support_auto_parallel

XieYunshen

LGTM

GhostScreaming

LGTM

chenwhql · 2023-11-07T03:22:00Z

paddle/fluid/eager/custom_operator/custom_operator_utils.cc

+    current_process_mesh =
+        paddle::holds_alternative<phi::distributed::TensorDistAttr>(
+            spmd_info.first[0])
+            ? paddle::get<0>(spmd_info.first[0]).process_mesh()


paddle::get直接使用需要加try catch，或者直接使用PADDLE_GET系列宏，不然一旦报错就很难分析，后面建议再完善一下

XiaoguangHu01

LGTM

* custom op support auto parallel

RichardWooSJTU · 2023-11-09T09:09:42Z

paddle/fluid/eager/custom_operator/custom_operator_utils.cc

+                "have the same mesh.",
+                input.name()));
+      } else {
+        PADDLE_ENFORCE_EQ(


我这边一个带可选input自定义算子的执行会出现问题，定位到这里应该是input.impl().get()出来了一个空指针，请问这里有考虑自定义算子的可选Input吗

* custom op support auto parallel

wanghuancoder added 9 commits October 30, 2023 11:08

custom op support auto parallel

0a508e0

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4b658f4

… custom_op_support_auto_parallel

refine

ebc9369

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0d5cdfd

… custom_op_support_auto_parallel

refine

429bfd8

refine

08f2564

refine

ae6d1b1

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fec496c

… custom_op_support_auto_parallel

refine

d28838e

wanghuancoder changed the title ~~Custom op support auto parallel~~ [AutoParallel] Custom op support auto parallel Nov 1, 2023

wanghuancoder added 4 commits November 1, 2023 11:05

refine

7490838

refine

da6a898

refine

b8f5a48

refine

40d2070

paddle-bot bot added the contributor External developers label Nov 1, 2023

wanghuancoder added 10 commits November 2, 2023 01:48

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ff62c9a

… custom_op_support_auto_parallel

refine

3f38300

refine

b0c8891

refine

b9fe126

refine

e069ba3

refine

64fb686

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

174be88

… custom_op_support_auto_parallel

refine

635fe9b

refine

322b1b3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b856a7a

… custom_op_support_auto_parallel

paddle-bot bot removed the contributor External developers label Nov 3, 2023

refine

be791d2

GhostScreaming previously approved these changes Nov 3, 2023

View reviewed changes

jiahy0825 requested changes Nov 3, 2023

View reviewed changes

refine

903d61e

wanghuancoder dismissed GhostScreaming’s stale review via 903d61e November 6, 2023 01:57

refine

ea4aa2f

jiahy0825 reviewed Nov 6, 2023

View reviewed changes

wanghuancoder added 2 commits November 6, 2023 06:48

refine

963c4a7

refine

5de9314

jiahy0825 previously approved these changes Nov 6, 2023

View reviewed changes

wanghuancoder added 2 commits November 6, 2023 08:18

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

405955d

… custom_op_support_auto_parallel

refine

8185016

wanghuancoder dismissed jiahy0825’s stale review via 8185016 November 6, 2023 08:20

XieYunshen approved these changes Nov 7, 2023

View reviewed changes

GhostScreaming approved these changes Nov 7, 2023

View reviewed changes

chenwhql approved these changes Nov 7, 2023

View reviewed changes

XiaoguangHu01 approved these changes Nov 7, 2023

View reviewed changes

wanghuancoder merged commit fe862dd into PaddlePaddle:develop Nov 7, 2023
28 checks passed

jiahy0825 pushed a commit to jiahy0825/Paddle that referenced this pull request Nov 7, 2023

[AutoParallel] Custom op support auto parallel (PaddlePaddle#58553)

c641948

* custom op support auto parallel

zeroRains pushed a commit to zeroRains/Paddle that referenced this pull request Nov 8, 2023

[AutoParallel] Custom op support auto parallel (PaddlePaddle#58553)

514b0dd

* custom op support auto parallel

RichardWooSJTU reviewed Nov 9, 2023

View reviewed changes

danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023

[AutoParallel] Custom op support auto parallel (PaddlePaddle#58553)

f0ef5b9

* custom op support auto parallel

SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023

[AutoParallel] Custom op support auto parallel (PaddlePaddle#58553)

7da7b67

* custom op support auto parallel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Custom op support auto parallel #58553

[AutoParallel] Custom op support auto parallel #58553

wanghuancoder commented Nov 1, 2023 •

edited

Loading

GhostScreaming left a comment

jiahy0825 Nov 3, 2023

wanghuancoder Nov 6, 2023

wanghuancoder Nov 6, 2023

jiahy0825 Nov 3, 2023

wanghuancoder Nov 6, 2023

jiahy0825 Nov 3, 2023

wanghuancoder Nov 6, 2023

jiahy0825 Nov 6, 2023

jiahy0825 Nov 3, 2023

wanghuancoder Nov 6, 2023

jiahy0825 Nov 6, 2023

wanghuancoder Nov 6, 2023

jiahy0825 Nov 6, 2023

jiahy0825 left a comment

XieYunshen left a comment

GhostScreaming left a comment

chenwhql Nov 7, 2023

XiaoguangHu01 left a comment

RichardWooSJTU Nov 9, 2023


		std::vector<Tensor>* all_inputs = ctx.AllMutableInput();

		#ifdef PADDLE_WITH_DISTRIBUTE

[AutoParallel] Custom op support auto parallel #58553

[AutoParallel] Custom op support auto parallel #58553

Conversation

wanghuancoder commented Nov 1, 2023 • edited Loading

PR types

PR changes

Description

GhostScreaming left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiahy0825 left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

GhostScreaming left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghuancoder commented Nov 1, 2023 •

edited

Loading