[Auto Parallel] Add paddle.distributed.to_static api #59682

pkuzyc · 2023-12-05T03:29:09Z

PR types

New features

PR changes

APIs

Description

Pcard-76459
Add paddle.distributed.to_static api and its returned class DistModel for converting dygraph auto parallel model to static mode.

paddle.distributed.to_static converts the model and data loader used in dygraph auto-parallelism to that in static mode auto-parallelism. It returns a DistModel instance that provides APIs and a DistributedDataLoader to generate data for static mode auto-parallel training, evaluation and prediction.

Doc: PaddlePaddle/docs#6357

paddle-bot · 2023-12-05T03:29:14Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu · 2023-12-06T09:09:41Z

python/paddle/distributed/auto_parallel/api.py

+
+    def __call__(self, *args):
+        if self._mode is None:
+            raise ValueError("Please call train()/eval()/predict() first.")


default is train

Done, default mode is set according to the init args.

zhiqiu · 2023-12-06T09:12:05Z

python/paddle/distributed/auto_parallel/api.py

+    # convert dygraph model to static model
+    batch_size = loader.batch_sampler.batch_size
+    inputs_spec, labels_spec = dist_model._engine._prepare_data_spec(
+        loader.dataset, None, batch_size
+    )
+
+    if optimizer is not None and loss is not None:
+        # get the static graph in train mode
+        dist_model._engine.prepare(
+            inputs_spec, labels_spec, mode="train", init_parameters=False
+        )
+    if loss is not None:
+        # get the static graph in eval mode
+        dist_model._engine.prepare(
+            inputs_spec, labels_spec, mode="eval", init_parameters=False
+        )
+    # get the static graph in predict mode
+    dist_model._engine.prepare(
+        inputs_spec, None, mode="predict", init_parameters=False
+    )
+
+    # get DistributedDataLoader for static mode auto-parallelism
+    batch_size = dist_model._engine._validate_batch_size(batch_size)
+    dist_loader = dist_model._engine._prepare_dataloader(
+        loader.dataset, return_list=True, batch_size=batch_size
+    )


Better move these lines into __init__ of DistModel

zhiqiu · 2023-12-06T09:14:07Z

python/paddle/distributed/auto_parallel/static/engine.py

+        inputs_var = dist_context.serial_feed_vars["inputs"]
+        labels_var = dist_context.serial_feed_vars["labels"]


what if the feed is not called inputs and labels?

inputs and labels are the key names of the dict dist_context.serial_feed_vars, not the names of model input and label.

zhiqiu

LGTM

JZ-LIANG · 2023-12-07T03:27:40Z

python/paddle/distributed/auto_parallel/static/engine.py

        for name, param in named_params.items():
            var = global_scope().var(name)
+            dense_tensor = var.get_tensor()


miss a filter that filter out not to share param not in this rank in PP.

Now the parameter initialization is moved to LayerHelper.init(), the filter is included in LayerHelper.init()

jeff41404 · 2023-12-07T12:11:32Z

python/paddle/distributed/auto_parallel/api.py

+            ... )
+            >>> loss_fn = nn.MSELoss()
+
+            >>> dist_model, dist_loader = dist.static_decorate(


dist.static_decorate should be dist.to_static ?

jeff41404 · 2023-12-07T12:13:42Z

python/paddle/distributed/auto_parallel/api.py

+):
+    """
+    Converts the model and data loader used in dygraph auto-parallelism to
+    that in static mode auto-parallelism. static_decorate returns a DistModel


static_decorate should be to_static ?

jeff41404 · 2023-12-07T12:14:52Z

test/auto_parallel/semi_auto_parallel_dist_to_static_api.py

+        dist_model._engine._has_prepared["eval"] = True
+        dist_model._engine._has_prepared["predict"] = True
+
+    # python -m paddle.distributed.launch --devices=0,1 semi_auto_parallel_static_decorate_api.py


semi_auto_parallel_static_decorate_api.py should be semi_auto_parallel_dist_to_static_api.py ?

jeff41404 · 2023-12-07T12:15:54Z

test/auto_parallel/semi_auto_parallel_dist_to_static_mlp.py

+
+        np.testing.assert_allclose(dy_losses, dy2static_losses, rtol=1e-6)
+
+    # python -m paddle.distributed.launch --devices=0,1 semi_auto_parallel_static_decorate_mlp.py


semi_auto_parallel_static_decorate_mlp.py should be semi_auto_parallel_dist_to_static_mlp.py?

jeff41404 · 2023-12-07T12:17:54Z

python/paddle/distributed/auto_parallel/api.py

+def to_static(
+    layer: paddle.nn.Layer,
+    loader=None,
+    loss=None,
+    optimizer=None,
+    strategy=None,
+):


I saw in the design document that there is parameter of metrics. Shall we need to implement metrics which is not implemented here? If not, please explain the reason and modify the design document.

metrics is not consisted in the original design, it is not used now. So removed it here. I will modify the design document.

zhiqiu

LGTM

jeff41404

LGTM

XieYunshen

LGTM for set_tests_properties(test_semi_auto_parallel_dist_to_static PROPERTIES LABELS "RUN_TYPE=EXCLUSIVE" TIMEOUT 300)

jzhang533 · 2023-12-08T02:44:29Z

python/paddle/distributed/auto_parallel/api.py

+    """
+    DistModel is a wrapper of the network model for the use of static mode
+    auto parallel. DistModel contains the distributed Graph of the model and
+    offers the APIs for training, evaluation and prediction.


is it possible to make this docstring more understandable ?
e.g.: very challenging for me to understand the term : "static mode auto parallel.", even after googling this phrase.

I will rephrase the docstring, update in next PR.

jzhang533 · 2023-12-08T02:54:38Z

python/paddle/distributed/auto_parallel/api.py

 # Part2: DistTensor construction related APIs


+def to_static(
+    layer: paddle.nn.Layer,


I'd like to suggest not using type annotation for any arguments.

Update in next PR

jzhang533 · 2023-12-08T03:01:41Z

python/paddle/distributed/auto_parallel/api.py

+    """
+    dist_model = DistModel(layer, loader, loss, optimizer, strategy)
+    dist_loader = dist_model.dist_loader
+


the API name to_static made me very confused, especially trying very hard to understand the relation to paddle.jit.to_static.

from this 2 LOC implementation, I think this API basically is a creator API for DistModel. I'd like to suggest a more intuitive API name, e.g.: dist_model_creator.

The function (or purpose) of this api is to convert a model whose parameters are Distributed Tensor (generated by shard_tensor). I think to_static is more suitable for this function.

jzhang533

giving an approve, since we want to land this PR before release/2.6 branch cutoff in offline discussion with @zhiqiu and @pkuzyc .

will leave docstr improvement as future work.

pkuzyc force-pushed the static_decorate branch from 0a9798f to daee140 Compare December 5, 2023 15:05

pkuzyc added 7 commits December 6, 2023 11:48

stsatic_decorate v0.1

45aef18

update static_decorate as comments

a570331

add unit tests and adapt placement api

7526459

add docs for the api

76d2c0a

remove useless print and comments

2d5ba39

add unit execution code

e350af0

fix sample code for static_decorate

228e8fa

pkuzyc force-pushed the static_decorate branch from daee140 to 228e8fa Compare December 6, 2023 04:48

add get_program interface in DistModel

f7d0d66

zhiqiu reviewed Dec 6, 2023

View reviewed changes

modify as suggested

4a4e8d6

pkuzyc changed the title ~~[Auto Parallel] Add static_decorate api~~ [Auto Parallel] Add paddle.distributed.to_static api Dec 6, 2023

pkuzyc added 2 commits December 6, 2023 22:29

move the init parameters part to helper.py

1dc7187

fix unittest name in CMakeList

049ba6a

zhiqiu previously approved these changes Dec 7, 2023

View reviewed changes

JZ-LIANG reviewed Dec 7, 2023

View reviewed changes

pkuzyc mentioned this pull request Dec 7, 2023

[AutoParallel] Add doc for paddle.distributed.to_static API PaddlePaddle/docs#6357

Closed

fix typo in comment

95ea711

jeff41404 reviewed Dec 7, 2023

View reviewed changes

fix typo and remove unused function

45b87b6

pkuzyc dismissed zhiqiu’s stale review via 45b87b6 December 7, 2023 13:03

pkuzyc assigned pkuzyc and jzhang533 Dec 7, 2023

zhiqiu approved these changes Dec 8, 2023

View reviewed changes

jeff41404 approved these changes Dec 8, 2023

View reviewed changes

XieYunshen approved these changes Dec 8, 2023

View reviewed changes

jzhang533 reviewed Dec 8, 2023

View reviewed changes

jzhang533 approved these changes Dec 8, 2023

View reviewed changes

zhiqiu merged commit 67fccc6 into PaddlePaddle:develop Dec 8, 2023
27 of 29 checks passed

pkuzyc deleted the static_decorate branch December 22, 2023 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel] Add paddle.distributed.to_static api #59682

[Auto Parallel] Add paddle.distributed.to_static api #59682

pkuzyc commented Dec 5, 2023 •

edited

Loading

paddle-bot bot commented Dec 5, 2023

zhiqiu Dec 6, 2023

pkuzyc Dec 6, 2023

zhiqiu Dec 6, 2023

pkuzyc Dec 6, 2023

zhiqiu Dec 6, 2023

pkuzyc Dec 6, 2023

zhiqiu left a comment

JZ-LIANG Dec 7, 2023

pkuzyc Dec 7, 2023

jeff41404 Dec 7, 2023

pkuzyc Dec 7, 2023 •

edited

Loading

jeff41404 Dec 7, 2023

pkuzyc Dec 7, 2023 •

edited

Loading

jeff41404 Dec 7, 2023

pkuzyc Dec 8, 2023

jeff41404 Dec 7, 2023

pkuzyc Dec 8, 2023

jeff41404 Dec 7, 2023

pkuzyc Dec 8, 2023

zhiqiu left a comment

jeff41404 left a comment

XieYunshen left a comment

jzhang533 Dec 8, 2023

pkuzyc Dec 8, 2023

jzhang533 Dec 8, 2023

pkuzyc Dec 8, 2023

jzhang533 Dec 8, 2023

pkuzyc Dec 8, 2023

jzhang533 left a comment

		inputs_var = dist_context.serial_feed_vars["inputs"]
		labels_var = dist_context.serial_feed_vars["labels"]


		np.testing.assert_allclose(dy_losses, dy2static_losses, rtol=1e-6)

		# python -m paddle.distributed.launch --devices=0,1 semi_auto_parallel_static_decorate_mlp.py

[Auto Parallel] Add paddle.distributed.to_static api #59682

[Auto Parallel] Add paddle.distributed.to_static api #59682

Conversation

pkuzyc commented Dec 5, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkuzyc Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkuzyc Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

pkuzyc commented Dec 5, 2023 •

edited

Loading

pkuzyc Dec 7, 2023 •

edited

Loading

pkuzyc Dec 7, 2023 •

edited

Loading