-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multi-Stream, Single-Thread in New Executor #35024
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g7b317e07ff385d85aa656204b971a042
cuda官方文档中表示,对Event的初始化,使用如下flag对于我们的性能比较好:
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
是否把CudaEvent中的flag修改一下?现在是cudaEventDefault。
是的,我这里已经做了处理。代码逻辑在这里: auto cuda_event = std::make_shared<platform::CudaEvent>(
platform::get_cuda_flags(false, false, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for new c++ operators
PR types
New features
PR changes
Others
Describe
1. 描述
Support Multi-Stream, Single-Thread in New Executor
For Program or Graph topology:
In this PR:
2. 为什么引入h2d/d2h 算子?
本PR 新增了两个细粒度的数据拷贝的算子,目的是为了更加精细化的进行Op的管理和调度。
What's Next?