Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PTen] Compatible runtime performance optimization #36946

Merged

Conversation

chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Nov 2, 2021

PR types

Performance optimization

PR changes

Others

Describe

[PTen] Compatible runtime performance optimization

目前PTen执行兼容态由于引入了pten kernelContext及SmallVector构造析构,pten::DenseTensor的构造析构,导致调度性能下降。

测试代码:

import paddle
import numpy as np
import yep

paddle.set_device("cpu")
x_data = np.random.uniform(0.1, 1, [10]).astype(np.float32)
y_data = np.random.uniform(1, 3, [10]).astype(np.float32)

x = paddle.to_tensor(x_data)
y = paddle.to_tensor(y_data)

yep.start("dot.prof")
for i in range(1000000):
    z = paddle.dot(x, y)
yep.stop()

现develop核心执行函数火焰图如下:

  • Run: 24.73%,3.73s

image

因此,本PR尝试对这一问题进行优化,主要通过缓存KernelContext、DenseTensor解决问题,能够避免大部分不必要的开销。

但由于兼容态存在两种Tensor(fluid Tensor和pten DenseTensor),所以至少会引入Tensor成员和shared_ptr的拷贝构造及析构开销,现阶段难以避免,本PR修改后的核心执行函数火焰图如下:

  • Run: 18.03%, 2.33s

image

tqdm测试,测试代码中循环部分改为:

for i in tqdm(range(1000000)):
    z = paddle.dot(x, y)

现develop的数据:

image

本PR数据:

image

在demo上的执行性能约提升27%

@paddle-bot-old
Copy link

paddle-bot-old bot commented Nov 2, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Shixiaowei02
Shixiaowei02 previously approved these changes Nov 9, 2021
Copy link
Contributor

@JiabinYang JiabinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comment

}
};

class CompatibleDenseTensorMetaUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

void ResetAllocation(std::shared_ptr<paddle::memory::Allocation> allocation,
size_t offset) {
allocation_ = allocation;
data_ = pten::Allocation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cause value error, if we resize tensor in kernel. Anyway, we can solve it in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this point, will change kernel output share rule in next pr

Copy link
Contributor

@JiabinYang JiabinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for operator.h

@chenwhql chenwhql merged commit 76d2fd1 into PaddlePaddle:develop Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants