-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support priority scheduling for standalone executor #49275
Support priority scheduling for standalone executor #49275
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
class TestOpPriority(unittest.TestCase): | ||
def test_op_priority(self): | ||
if not paddle.fluid.core.is_compiled_with_cuda(): | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why skip cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to skip CPU. It has been modified, thx.
async_work_queue_->AddTask(vec_instr.at(i).KernelType(), | ||
[this, i] { RunInstructionAsync(i); }); | ||
if (FLAGS_new_executor_serial_run) { | ||
RunInstructionAsync(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
根据PR对FLAGS_new_executor_serial_run的重构,开启该Flag之后,相当于所有instruction都在主线程执行吧?这与trace模式的区别主要是调度顺序的差异,后续是不是可以考虑把这两种情况的执行接口进行统一,抽象出一个构造调度顺序的逻辑。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个问题需要更深入地讨论。
在我的理解里,serial_run是为了代码debug而开发的,其主要目的是解决异步模式下多线程并发调度所带来的问题定位困难;而trace模式是为了让所有算子都在主线程上进行调度而开发的,其主要目的是解决动转静场景下线程切换所引入的缓存不友好等问题。
在实际行为上,虽然重构后的serial_run所有算子也都是在主线程进行调度,但仍然走的是异步模式下的调度逻辑(即RunInstructionAsync),其调度顺序仍然是运行时决定的,并不尝试提前构造调度序列;而trace模式会尝试提前构造出一个调度序列,在实际运行时直接for-loop调度。
若考虑将serial_run和trace模式进行统一,则运行逻辑也需要进行统一,要么是将serial_run改造成trace模式,要么是将trace模式改造成serial_run。
若是将serial_run改造成trace模式,会将serial_run的实时调度也改造成trace模式的提前构造和for-loop调度。此时的serial_run相比原先的异步执行模式不再是只屏蔽了单线程与多线程的差异,而是整个调度代码都直接换了一套。若是如此,serial_run也失去了调试异步模式的作用,不如直接去除这个功能,在需要serial_run时通过开启trace模式进行替代。但我觉得当前serial_run仍是有存在的意义的,trace模式并不能起到完全替代serial_run的作用。
若是将trace模式改造成serial_run,那需要先评估trace模式和serial_run哪种更优。两种模式下不同调度顺序孰优孰劣,当前并没有经过充分地论证或验证,当前异步调度逻辑只是多线程并发友好,在单线程下也并不一定顺序就是最优的(事实上,寻找最优的调度序列,本身就是一个非常困难的命题)。若暂不考虑调度顺序的差异,提前构造调度序列然后直接for-loop执行的方式,也具有较小的运行时开销。若是trace模式改造成serial_run,则凭空引入了诸如引用计数更新和实时队列操作等开销,这些开销在异步模式下可以通过多线程并发所带来的收益进行弥补,而在单线程情况下则是完全没有正面收益的负面开销,因而我个人认为也是没有必要的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
Support priority scheduling for standalone executor, see test_standalone_op_priority for more usage details.