Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIR][AutoParallel] Support 1F1B/FThenB with PIR #58459

Merged
merged 8 commits into from
Oct 30, 2023

Conversation

zhiqiu
Copy link
Contributor

@zhiqiu zhiqiu commented Oct 27, 2023

PR types

New features

PR changes

Others

Description

PCard-71568

From #58405

  • add SplitFeedTensor method for pir
  • add shadow_output op for pir to hold var
  • add data op to construct input for pir

@paddle-bot
Copy link

paddle-bot bot commented Oct 27, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 27, 2023
@paddle-bot
Copy link

paddle-bot bot commented Oct 27, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@zhiqiu zhiqiu changed the title New ir [PIR][AutoParallel] Support 1F1B/FThenB with PIR Oct 27, 2023
feed_name,
numel_size,
micro_batch_num));
int64_t split_size = (numel_size + micro_batch_num - 1) / micro_batch_num;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numel_size % micro_batch_num = 0时,(numel_size + micro_batch_num - 1) / micro_batch_numnumel_size / micro_batch_num是等价的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it in the next PR

VLOG(4) << "Split feed data:" << feed_name << ", dims:("
<< feed_tensor.dims() << "), micro_batch_num:" << micro_batch_num;
for (int64_t j = 0; j < micro_batch_num; ++j) {
(*out)[j].resize(i + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在内层循环中每次通过resize动态扩展vector是低效的,可以直接给(*out)[j]分配feed_names.size()长度的容量

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it in the next PR

@@ -278,6 +316,19 @@ def set_skip_gc_vars(num_micro_batches, type_to_program, jobs):
job.set_skip_gc_vars(skip_gc_vars)
suffixed_required_vars[micro_batch_id] |= required_vars

if get_flags("FLAGS_enable_new_ir_in_executor")[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段逻辑看起来与skip_gc_vars并无关联,为何要写在set_skip_gc_vars函数里?把job_typessub_programs打包成dict的代码杂糅进set_skip_gc_vars也是不建议的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it in the next PR

@@ -225,7 +225,44 @@ def var_can_be_deleted(var_name, block):
return var is not None and not var.persistable


def set_skip_gc_vars(num_micro_batches, type_to_program, jobs):
def prepare_ir_program(cur_prog, next_prog):
set_output_names = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_output_namesset在阅读上是有二义性的,既可以表示集合,也可以表示设置。如非必须,不建议在变量命名中带入listarrayset等基本类型信息。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it in the next PR

Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Some coding problems should be improved in the future.

@zhiqiu zhiqiu merged commit 3b01b2a into PaddlePaddle:develop Oct 30, 2023
@paddle-bot paddle-bot bot removed the contributor External developers label Nov 3, 2023
zeroRains pushed a commit to zeroRains/Paddle that referenced this pull request Nov 8, 2023
* [PIR][AutoParallel] Support 1F1B/FThenB with PIR

* fix splitfeed

* fix include

* rm fetch

* fix unittest

* fix scope error

* program interpreter use local scope to avoid var conflict

* fix ut

---------

Co-authored-by: zhaoyingli <zhaoyingli@baidu.com>
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
* [PIR][AutoParallel] Support 1F1B/FThenB with PIR

* fix splitfeed

* fix include

* rm fetch

* fix unittest

* fix scope error

* program interpreter use local scope to avoid var conflict

* fix ut

---------

Co-authored-by: zhaoyingli <zhaoyingli@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants