Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] remove pyreader, use feed op in pipeline schedule #56511

Merged
merged 11 commits into from
Aug 25, 2023

Conversation

zhaoyinglia
Copy link
Contributor

@zhaoyinglia zhaoyinglia commented Aug 21, 2023

PR types

Others

PR changes

Others

Description

PCard-71568

  • executor.py:在自动并行pipeline场景下,切换使用迭代式dataloader,并确保_feed_data方法支持local_batch切分为 micro_batch modify feed_data for dataloader in pipline parallel mode #56453
  • engine.py:修改fit接口,支持gradient_merge与pipeline场景下,对feed_data的不同要求
  • StandaloneExecutor.cc:修改 feed/fetch op 的 col 属性,使得执行器获得正确对应micro_batch_id的feed data

@paddle-bot
Copy link

paddle-bot bot commented Aug 21, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhaoyinglia zhaoyinglia changed the title Mv read [AutoParallel] remove use read op, use feed op in pipeline schedule Aug 21, 2023
def _validate_feed(self, feed):
if feed is None:
return [None]
if self._strategy.pipeline.enable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为何pipeline.enable的时候不需要处理?pipeline.enable是否代表的不是开启pipeline的字面意思?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strategy.pipeline.enable=True 表示开启了调度优化,需要配合设置 schedule_mode 使用。如果该策略为 False,仍然可以走pipeline切图,但是不会有任何调度优化

assert (
batch_size % self._acc_steps == 0
), "Requires batch_size:[{}] to be divisible by acc_steps:[{}].".format(
batch_size, self._acc_steps
)
return batch_size // self._acc_steps

def _validate_feed(self, feed):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议添加注释示例说明从feed到micro_feeds的数据形式变化和切分逻辑

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in next pr.

Comment on lines +1078 to +1084
micro_feed = (
_as_lodtensor(
micro_cur_feed[i], self.place, var.dtype
)
if num_micro_batch > 1
else micro_cur_feed[i]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
micro_feed = (
_as_lodtensor(
micro_cur_feed[i], self.place, var.dtype
)
if num_micro_batch > 1
else micro_cur_feed[i]
)
micro_feed = (
_as_lodtensor(
micro_cur_feed[i], self.place, var.dtype
)
if not isinstance(micro_cur_feed[i], core.LoDTensor)
else micro_cur_feed[i]
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in next pr.

void SetColAttrForFeedFetchOps(std::shared_ptr<ProgramDesc> program_desc,
const int64_t micro_batch_num,
const int64_t micro_batch_id) {
const std::set<std::string>& valid_fetch_op_types = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::set<std::string>& valid_fetch_op_types = {
const std::set<std::string>& valid_feed_fetch_op_types = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in next pr.

const std::set<std::string>& valid_feed_fetch_op_types = {"fetch",
"fetch_v2"};

const std::vector<int> all_op_ids = job.AllFetchOpIds();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的逻辑删除后,job相关的几个接口ColAttrForFetchOpAllFetchOpIdsSetColAttrForFetchOp也应一并删除。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete in next pr.

From00
From00 previously approved these changes Aug 23, 2023
@zhaoyinglia zhaoyinglia changed the title [AutoParallel] remove use read op, use feed op in pipeline schedule [AutoParallel] remove pyreader, use feed op in pipeline schedule Aug 23, 2023
Copy link
Contributor

@zoooo0820 zoooo0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for fluid change

@From00 From00 merged commit 0012c8d into PaddlePaddle:develop Aug 25, 2023
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
…dlePaddle#56511)

* modify feed_data for dataloader in pipline parallel mode

* add pre-commit

* remove read op, use feed op

* fix validate batch_size

* tiny fix

* support catch EOFException

* fix conflict

* fix conflict

* fix executor if cond

---------

Co-authored-by: Frida-a <2624653516@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants