auto parallel support pipeline scheduler with standalone executor #54727

zhaoyinglia · 2023-06-19T02:52:08Z

PR types

Others

PR changes

Others

Description

Pcard-70448

新增new_executor_micro_batching flag 来控制在流水线并行时，是使用standalone executor还是fleet executor。仅用于过渡时期debug，后续迁移完成会删除。
新增在使用使用流水线调度后，将每个micro_batch返回结果合并的功能，目前仅支持 return_numpy 为True的场景，后续会支持为False的场景。
对齐 pp2 + gpt 场景下 fleet executor 与 standalone executor 执行后结果。

paddle-bot · 2023-06-19T02:52:12Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

From00 · 2023-06-19T08:23:57Z

paddle/fluid/framework/new_executor/interpretercore.cc

@@ -52,6 +52,10 @@ PADDLE_DEFINE_EXPORTED_bool(new_executor_use_local_scope,
                            true,
                            "Use local_scope in new executor(especially used "
                            "in UT), can turn off for better performance");
+PADDLE_DEFINE_EXPORTED_bool(


用于控制python代码的环境变量，不需要在C++端声明。

From00 · 2023-06-19T09:10:42Z

python/paddle/distributed/auto_parallel/static/parallelizer_v2.py

@@ -368,7 +378,17 @@ def _apply_post_optimization(
                [main_program], [startup_program], self._pass_context
            )

-        if self._strategy.pipeline.enable:
+        new_executor_micro_batching = os.environ.get(


这里判断是否使用新执行器的代码在engine.py里也有类似的，两处代码是否可以合成一处？

From00 · 2023-06-19T09:15:54Z

python/paddle/distributed/auto_parallel/static/parallelizer_v2.py

+
+        if self._strategy.pipeline.enable and use_new_executor:
+            main_program._pipeline_opt = {}
+            main_program._pipeline_opt["standalone_exe"] = {


_pipeline_opt用standalone_exe做key容易造成误解，建议可以改成standalone_opt或其它更合适的名字

done。修改成了 standalone_opt

From00 · 2023-06-19T11:11:44Z

python/paddle/distributed/passes/pipeline_scheduler_pass.py

+
+
+def apply_pass(main_program, startup_program, pass_name, pass_attr={}):
+    from paddle.distributed.passes import PassContext, new_pass


按照python编码规范，导入语句必须在文件顶部, 位于模块的注释和文档字符串之后、全局变量和全局常量之前。不建议在函数内部做导入。

From00 · 2023-06-19T11:43:32Z

python/paddle/fluid/executor.py

@@ -653,8 +681,15 @@ def run(self, feed_names, return_numpy=True):
        """
        tensors = self._new_exe.run(feed_names)._move_to_list()
        if return_numpy:
-            return as_numpy(tensors, copy=True)
+            tensors = as_numpy(tensors, copy=True)
+            if self._plan.micro_batch_num() <= 1:


_merge_tensors是否可以处理micro_batch_num=1的情况？

支持。已合并。

From00 · 2023-06-19T11:44:11Z

python/paddle/fluid/executor.py

        else:
+            if self._plan.micro_batch_num() > 1:
+                logging.warning(


这里建议直接抛错误

From00 · 2023-06-19T11:44:59Z

python/paddle/fluid/executor.py

-            scope,
-        )
+        if pipeline_opt:
+            from paddle.distributed.passes.pipeline_scheduler_pass import (


不建议在函数内部做导入。

不在此处做导入会发生循环引用的问题。

From00 · 2023-06-19T11:47:13Z

python/paddle/fluid/executor.py

@@ -1408,7 +1460,21 @@ def _run_impl(

        fetch_list = self._check_fetch_list(fetch_list)

-        if isinstance(program, Program) and program._pipeline_opt:
+        new_executor_micro_batching = os.environ.get(


建议将FLAGS开关判断代码统一到一处

…a/Paddle into standalone_exe_FThenB

XieYunshen

LGTM for set_tests_properties(test_pipeline_scheduler_FThenB PROPERTIES LABELS "RUN_TYPE=EXCLUSIVE" TIMEOUT 50)

zhiqiu

LGTM

From00

LGTM

auto parallel support pipeline scheduler with standalone executor

e403bcf

zhaoyingli added 4 commits June 19, 2023 11:14

rm check_fetch

9aad3d6

update cmakelist and flags env

51feb46

rm set micro batch id

69774d9

rm import

1db5ae6

From00 reviewed Jun 19, 2023

View reviewed changes

zhaoyingli and others added 6 commits June 19, 2023 20:57

update utils func

4420680

raise error when merge tensor for return_numpy is False

9f3fe89

Merge branch 'develop' into standalone_exe_FThenB

c79d0fa

fix _pipeline_opt

c23f655

Merge branch 'standalone_exe_FThenB' of https://github.com/zhaoyingli…

d96c07d

…a/Paddle into standalone_exe_FThenB

fix unittest

da00855

XieYunshen approved these changes Jun 25, 2023

View reviewed changes

zhiqiu approved these changes Jun 25, 2023

View reviewed changes

From00 approved these changes Jun 25, 2023

View reviewed changes

From00 merged commit a702e17 into PaddlePaddle:develop Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto parallel support pipeline scheduler with standalone executor #54727

auto parallel support pipeline scheduler with standalone executor #54727

zhaoyinglia commented Jun 19, 2023 •

edited

Loading

paddle-bot bot commented Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

From00 Jun 19, 2023

zhaoyinglia Jun 19, 2023

XieYunshen left a comment

zhiqiu left a comment

From00 left a comment



		def apply_pass(main_program, startup_program, pass_name, pass_attr={}):
		from paddle.distributed.passes import PassContext, new_pass

auto parallel support pipeline scheduler with standalone executor #54727

auto parallel support pipeline scheduler with standalone executor #54727

Conversation

zhaoyinglia commented Jun 19, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jun 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

From00 left a comment

Choose a reason for hiding this comment

zhaoyinglia commented Jun 19, 2023 •

edited

Loading