-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PTen] Compatible runtime performance optimization #36946
[PTen] Compatible runtime performance optimization #36946
Conversation
Thanks for your contribution! |
… pten/compatible_phase_perf_improve
… pten/compatible_phase_perf_improve
… pten/compatible_phase_perf_improve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comment
paddle/pten/core/compat_utils.h
Outdated
} | ||
}; | ||
|
||
class CompatibleDenseTensorMetaUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thx
void ResetAllocation(std::shared_ptr<paddle::memory::Allocation> allocation, | ||
size_t offset) { | ||
allocation_ = allocation; | ||
data_ = pten::Allocation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe cause value error, if we resize tensor in kernel. Anyway, we can solve it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this point, will change kernel output share rule in next pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for operator.h
PR types
Performance optimization
PR changes
Others
Describe
[PTen] Compatible runtime performance optimization
目前PTen执行兼容态由于引入了pten kernelContext及SmallVector构造析构,pten::DenseTensor的构造析构,导致调度性能下降。
测试代码:
现develop核心执行函数火焰图如下:
因此,本PR尝试对这一问题进行优化,主要通过缓存KernelContext、DenseTensor解决问题,能够避免大部分不必要的开销。
但由于兼容态存在两种Tensor(fluid Tensor和pten DenseTensor),所以至少会引入Tensor成员和shared_ptr的拷贝构造及析构开销,现阶段难以避免,本PR修改后的核心执行函数火焰图如下:
tqdm测试,测试代码中循环部分改为:
现develop的数据:
本PR数据:
在demo上的执行性能约提升27%