[Paddle-Inference] Matmul_int8_convert: tensor*tensor #37285

Wangzheee · 2021-11-17T07:15:56Z

PR types

Others

PR changes

Others

Describe

增加matmul int8 量化的推理 op_convert 和 plugin：通过调用nvidia 显卡的 Tensor Core提高矩阵乘的计算速度，plugin 的实现包括 int8、fp16、fp32；通过将alpha传入plugin内与矩阵乘一起进行计算，实现matmul+scale的融合，加速推理；增加 dynload 动态加载 libcublasLt.so 的实现；增加对应量化的单测

性能测试：A(1, 28, 256, 1024)*B(1, 28, 1024, 256)

kernel（matmul和scale融合）的执行时间：

matmul int8 layer	matmul half layer	matmul float32 layer
0.027ms	0.123ms	0.751ms

单OP（matmul和scale融合）网络的执行时间：（int8 的matmul 需要对输入数据重新排布来支持 tensor core，反而会增加耗时，只有在矩阵规模十分庞大时，才能体现矩阵计算的加速效果；本op的实现中可根据对tensor的预分析，自动判断选择性能最佳的 int8、fp16、fp32的plugin）

matmul int8 op	matmul half op	matmul float32 op
37.2ms	35.6ms	57.1ms

kernel的执行时间：

matmul int8 layer + scale layer	matmul half layer + scale layer	matmul float32 layer + scale layer
0.053ms	0.152ms	0.815ms

单OP网络的执行时间：

matmul int8 op + scale op	matmul half op + scale op	matmul float32 op + scale op
41.2ms	39.1ms	65.1ms

总结：当矩阵较大时，matmul int8 op的加速性能较为明显；当存在scale的op融合时，加速性能比较明显
另：matmul int8的显存会有约 5％的略微减小

paddle-bot-old · 2021-11-17T07:16:15Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

shangzhizhou

LGTM

Superjomn

LGTM

chenwhql

LGTM for PADDLE_ENFORCE

chenwhql · 2021-11-24T07:33:41Z

paddle/fluid/inference/tensorrt/plugin/matmul_op_int8_plugin.cu

+    int32_t pos, nvinfer1::PluginTensorDesc const* inOut, int32_t nbInputs,
+    int32_t nbOutputs) const TRT_NOEXCEPT {
+  PADDLE_ENFORCE_EQ(nbInputs, 2,
+                    platform::errors::InvalidArgument("Must have 2 inputs, "


建议报错信息带一些环境信息，这样报错，用户可能不知道是什么场景？什么地方？需要2个输入，后续可以再补充一下

好的，下次pr我加一下~ thanks~

…7285) * matmul_convert_int8 * matmul_convert_int8 * matmulconvert_int8 * Matmul_int8_convert: tensor*tensor * Matmul_int8_convert: tensor*tensor * Matmul_int8_convert: tensor*tensor

matmul_convert_int8

35c1f6b

Wangzheee added 5 commits November 17, 2021 10:47

matmul_convert_int8

2c861b9

matmulconvert_int8

9db10b2

Matmul_int8_convert: tensor*tensor

1cd88a7

Matmul_int8_convert: tensor*tensor

fdee2b1

Matmul_int8_convert: tensor*tensor

842b5ed

shangzhizhou approved these changes Nov 23, 2021

View reviewed changes

Superjomn approved these changes Nov 24, 2021

View reviewed changes

raindrops2sea approved these changes Nov 24, 2021

View reviewed changes

chenwhql approved these changes Nov 24, 2021

View reviewed changes

shangzhizhou merged commit 1659079 into PaddlePaddle:develop Nov 24, 2021

zhangjun mentioned this pull request Oct 17, 2022

TensorRT zhangjun/zhangjun.github.io#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle-Inference] Matmul_int8_convert: tensor*tensor #37285

[Paddle-Inference] Matmul_int8_convert: tensor*tensor #37285

Wangzheee commented Nov 17, 2021

paddle-bot-old bot commented Nov 17, 2021

shangzhizhou left a comment

Superjomn left a comment

chenwhql left a comment

chenwhql Nov 24, 2021

Wangzheee Nov 24, 2021

[Paddle-Inference] Matmul_int8_convert: tensor*tensor #37285

[Paddle-Inference] Matmul_int8_convert: tensor*tensor #37285

Conversation

Wangzheee commented Nov 17, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Nov 17, 2021

shangzhizhou left a comment

Choose a reason for hiding this comment

Superjomn left a comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

chenwhql Nov 24, 2021

Choose a reason for hiding this comment

Wangzheee Nov 24, 2021

Choose a reason for hiding this comment