[Semi-auto]Add Shard/Replicate/Partial in DistTensor #58930

ForFishes · 2023-11-12T00:15:05Z

PR types

New features

PR changes

Others

Description

DistTensor通过Shard/Replicate/Partial构造函数，同时增加SRP转化为dim_mapping。后续将废弃通过TensorDistAttr构造DistTensor。动半中，DistAttr在推导转换出现，SRP在Reshard中出现。

Reshard 改造，基于Shard/Replicate/Parital。
改造Python所有设计DTensor的构造API，支持由ProcessMesh和Placements构造。
调用推导规则部分改造，使用Shard/Replicate/Parital转换后的DistAttr。

import paddle
import paddle.distributed as dist
from paddle.base import core

tensor = paddle.rand([2, 10])
mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
d_tensor = paddle.Tensor(
            tensor, process_mesh=mesh, placements=core.Shard(0)
        )

Pcard-73145

paddle-bot · 2023-11-12T00:15:11Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

LiYuRio · 2023-11-13T09:56:12Z

paddle/fluid/pybind/eager_utils.cc

+#else
+  PADDLE_THROW(platform::errors::Unavailable(
+      "Placements to PyObject is not supported in the current "
+      "PaddlePaddle, please recompile and installPaddlePaddle with the option "


install PaddlePaddle

LiYuRio · 2023-11-13T09:56:33Z

paddle/phi/common/reduce_type.h

@@ -13,6 +13,7 @@
 // limitations under the License.

 #pragma once
+#include <ostream>


这个头文件没用着

LiYuRio · 2023-11-13T10:06:03Z

paddle/phi/core/distributed/auto_parallel/placement_types.h

+  ReduceType reduce_type_;
+};
+
+using Placements = std::vector<std::shared_ptr<Placement>>;


这个是不是可以放到DistTensorMeta类里

LiYuRio · 2023-11-13T11:51:23Z

paddle/phi/core/distributed/auto_parallel/placement_types.h

+  }
+
+  bool operator==(const Placement& other) const override {
+    const Shard* other_shard = dynamic_cast<const Shard*>(&other);


这里是不是在placement里加一个shard_axis方法然后直接访问比较好

这个属于shard特有的，和Partial中的reduce_type_一样。

JZ-LIANG · 2023-11-14T02:44:07Z

paddle/fluid/pybind/auto_parallel_py.cc

+                   .def(py::init([](int64_t dim) {
+                     return std::make_shared<phi::distributed::Shard>(dim);
+                   }))
+                   .def("get_dim", &phi::distributed::Shard::get_dim)


张量的维度和 Mesh 的维度所用的名字是否要做区分？都叫 dim 的话，感觉容易混淆。 e.g.：axis，dim

这里的dim应该就是只tensor的dim，所以对齐一致。

JZ-LIANG · 2023-11-14T02:49:25Z

paddle/phi/core/distributed/auto_parallel/dist_tensor.cc

+  dist_attr.set_dims_mapping(dist_tensor_meta_.dim_mapping());
+  dist_attr.mark_annotated("process_mesh");
+  dist_attr.mark_annotated("dims_mapping");
+  dist_attr_ = dist_attr;


感觉后面 dist_tensor_meta_ 和 dist_attr_ 只能留一个做数据成员。不然会有数据同步问题，静半踩了挺多坑的。
比如 dist_attr 永远是临时变脸，只能通过函数 get_dist_attr(dist_tensor_meta_) 返回

嗯，是的。计划后续Disttensor将只会拥有DistTensorMeta和DenseTensor指针，其他的都将删掉。但是改动较大，暂时拆分pr合入。

JZ-LIANG · 2023-11-14T02:55:16Z

paddle/phi/core/distributed/auto_parallel/dist_tensor.h

@@ -121,12 +146,14 @@ class DistTensor final
 private:
  friend class ReshardFunction;

-  // The global dimensions(shape)
+  // The global dimensions(shape), will move to DistTensorMeta


感觉 tesnor shape，Mesh， dims mapping 都会用 dims。。。。这个名字后面可能要讨论讨论

嗯嗯，这个名字并没有统一。

chenwhql · 2023-11-14T07:53:16Z

paddle/phi/core/distributed/auto_parallel/placement_types.h

+ private:
+  std::shared_ptr<const ProcessMesh> process_mesh_;
+  Placements placements_;
+  std::shared_ptr<const DenseTensorMeta> tensor_meta_;


这里需要存整个DenseTensorMeta吗？

需要保存dtype，strides，layerout等信息，后续用于推导的cache，基本上和DenseTensorMeta是相同的。

chenwhql · 2023-11-14T07:54:40Z

test/auto_parallel/semi_auto_placements.py

+        self._mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
+
+    def run_test_placements(self):
+        self.placements = [core.Replicate(), core.Replicate()]


要不要直接把这几个类import到paddle.distributed下面，不用core所谓前缀

嗯嗯，目前这个单测是一个临时单测。不涉及API改动，后续要整体改动API，在更新这个单测。

chenwhql · 2023-11-14T07:56:18Z

paddle/phi/core/distributed/auto_parallel/placement_types.h

+ public:
+  virtual ~Placement() = default;
+
+  virtual bool is_shard(std::optional<int> dim = std::nullopt) const {


Placement基类会实际存在对象吗？要不要直接用纯虚函数

不会存在实际对象。

LiYuRio

LGTM

zhiqiu · 2023-11-14T12:58:55Z

paddle/phi/core/distributed/auto_parallel/placement_types.h

+ private:
+  std::shared_ptr<const ProcessMesh> process_mesh_;
+  Placements placements_;
+  std::shared_ptr<const DenseTensorMeta> tensor_meta_;


这里用share_ptr的原因是？

少一次copy，同时tensormeta可能被其他地方使用。生命周期和dtensor一致

XiaoguangHu01

LGTM

XieYunshen

LGTM for set_tests_properties(test_dist_tensor_api PROPERTIES LABELS "RUN_TYPE=EXCLUSIVE" TIMEOUT 100)

jzhang533

LGTM

) * add placements

ForFishes added 2 commits November 10, 2023 16:46

add placements

f272954

add placements

1b67368

ForFishes added 6 commits November 13, 2023 01:42

add placements

d0a3f9b

add placements

2c55737

add placements

c0affbc

add placements

40383ce

add placements

c2c88ec

fix utest

0698cc7

LiYuRio reviewed Nov 13, 2023

View reviewed changes

ForFishes added 2 commits November 13, 2023 23:50

add num shard

f501d33

add num shard

db148a1

ForFishes closed this Nov 13, 2023

ForFishes reopened this Nov 13, 2023

fix paddle enforce

fa4a538

JZ-LIANG reviewed Nov 14, 2023

View reviewed changes

ForFishes added 2 commits November 14, 2023 13:35

fix paddle enforce & add reducetype

eb3ca75

add reducetype

2a9ecb5

chenwhql reviewed Nov 14, 2023

View reviewed changes

LiYuRio approved these changes Nov 14, 2023

View reviewed changes

ForFishes assigned zhangjun Nov 14, 2023

zhiqiu reviewed Nov 14, 2023

View reviewed changes

tianshuo78520a approved these changes Nov 14, 2023

View reviewed changes

XiaoguangHu01 approved these changes Nov 14, 2023

View reviewed changes

XieYunshen approved these changes Nov 15, 2023

View reviewed changes

jzhang533 approved these changes Nov 15, 2023

View reviewed changes

chenwhql approved these changes Nov 15, 2023

View reviewed changes

ForFishes merged commit bd158d1 into PaddlePaddle:develop Nov 15, 2023

ForFishes deleted the srp_in_semi_auto branch November 15, 2023 02:22

SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023

[Semi-auto]Add Shard/Replicate/Partial in DistTensor (PaddlePaddle#58930

a239a23

) * add placements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Semi-auto]Add Shard/Replicate/Partial in DistTensor #58930

[Semi-auto]Add Shard/Replicate/Partial in DistTensor #58930

ForFishes commented Nov 12, 2023 •

edited

Loading

paddle-bot bot commented Nov 12, 2023

LiYuRio Nov 13, 2023

ForFishes Nov 13, 2023

LiYuRio Nov 13, 2023

ForFishes Nov 13, 2023

LiYuRio Nov 13, 2023

ForFishes Nov 13, 2023

LiYuRio Nov 13, 2023

ForFishes Nov 13, 2023

JZ-LIANG Nov 14, 2023

ForFishes Nov 14, 2023

JZ-LIANG Nov 14, 2023

ForFishes Nov 14, 2023

JZ-LIANG Nov 14, 2023

ForFishes Nov 14, 2023

chenwhql Nov 14, 2023

ForFishes Nov 14, 2023

chenwhql Nov 14, 2023

ForFishes Nov 14, 2023

chenwhql Nov 14, 2023

ForFishes Nov 14, 2023

LiYuRio left a comment

zhiqiu Nov 14, 2023

ForFishes Nov 14, 2023

XiaoguangHu01 left a comment

XieYunshen left a comment

jzhang533 left a comment

[Semi-auto]Add Shard/Replicate/Partial in DistTensor #58930

[Semi-auto]Add Shard/Replicate/Partial in DistTensor #58930

Conversation

ForFishes commented Nov 12, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Nov 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiYuRio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

ForFishes commented Nov 12, 2023 •

edited

Loading