[XPU] XPU inference support int8 #57258

csy0225 · 2023-09-13T05:42:43Z

PR types

New features

PR changes

Others

Description

Paddle-Inference xpu 后端增加量化推理的能力，支持 Paddle-Slim onnx 格式量化模型的加载推理，支持 conv、fc 的 int8 实现。

paddle-bot · 2023-09-13T05:42:47Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2023-09-13T05:42:48Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

paddle-ci-bot · 2023-09-28T03:02:23Z

Sorry to inform you that 9483e72's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot · 2023-10-17T03:11:02Z

Sorry to inform you that 3ab34c6's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

zhupengyang · 2023-10-24T06:30:32Z

paddle/phi/kernels/fusion/xpu/fc_xpu_kernel.cc

+                     const paddle::optional<DenseTensor>& scale_max,
+                     const paddle::optional<DenseTensor>& out_max_in,


scale_max -> w_max_per_channel，这样命名是不是更容易理解一些

这个和 xdnn 的 conv2d_fusion api 定义是一致的

xdnn 的 conv2d_fusion api 定义就是比较难理解，不建议和这个对齐

zhupengyang · 2023-10-24T08:00:09Z

paddle/fluid/framework/ir/xpu/pass_utils.h

+template <typename Tcpu, typename Txpu>
+void PrepareWeight(Graph* graph,
+                   Scope* scope,
+                   BlockDesc* block,
+                   Node* weight,
+                   Node** quant_weight,
+                   Node** quant_weight_max,
+                   bool transpose,
+                   const std::vector<float>& weight_scales);


两个 PrepareWeight 是否可以合并到

template <typename Tcpu, typename Txpu=int16_t> void PrepareWeight(Graph* graph, Scope* scope, BlockDesc* block, Node* src_w, Node** dst_w, Node** dst_w_max, bool transpose, const std::vector<float>& w_max={});

后续有 float->float 做 int31 计算的需求，不带 quant 命名会更通用一些

模型中带的 scale 可能实际含义是 max，这样的命名不太好，建议写代码的时候按照实际含义命名

已修改。

zhupengyang · 2023-10-24T08:04:33Z

paddle/fluid/framework/ir/xpu/quant_utils.cc

+  if (!weight_scales.empty()) {
+    LOG(FATAL) << "Weight scales should be empty(), otherwise, check if your "
+                  "model is quant model or not.";
+  }


PADDLE_ENFORCE

zhupengyang · 2023-10-24T09:04:15Z

paddle/fluid/framework/ir/xpu/quant_utils.cc

+template <
+    typename Tcpu,
+    typename Txpu,
+    typename std::enable_if<std::is_same<Tcpu, float>::value, Tcpu>::type* ptr>


float16 也需要支持吧

float16暂时没有这个需求，float16内部是转成 float32，在进行量化的。今后有需要可以在加

qili93

LGTM for const_cast

csy0225 force-pushed the inference_xpu_support_int8 branch 2 times, most recently from 9e04e30 to aee3544 Compare September 19, 2023 08:25

support int8

26e125d

csy0225 force-pushed the inference_xpu_support_int8 branch from aee3544 to 26e125d Compare September 19, 2023 09:14

support fc_xpu int8

9483e72

support quantize of pass

3ab34c6

support fp16 fix

da5cb07

csy0225 force-pushed the inference_xpu_support_int8 branch from 5712e97 to da5cb07 Compare October 20, 2023 02:02

csy0225 added 5 commits October 20, 2023 14:05

remove op_weights_precision attr

fdb14aa

support fp16 quantize model

c30a50c

code style update

11da53f

merge develop into branch

ea8907e

update quantize/dequantize op yaml

748bb9d

csy0225 changed the title ~~Inference xpu support int8~~ [XPU] XPU inference support int8 Oct 23, 2023

csy0225 force-pushed the inference_xpu_support_int8 branch 3 times, most recently from 958c7bb to c5ec5d9 Compare October 23, 2023 11:41

fix code style

5fea223

csy0225 force-pushed the inference_xpu_support_int8 branch from c5ec5d9 to 5fea223 Compare October 23, 2023 11:54

csy0225 added 2 commits October 24, 2023 11:13

fix link quantize_helper.cc library wrong

4fa68eb

fix link quantize_helper.cc library wrong

cdeec39

zhupengyang reviewed Oct 24, 2023

View reviewed changes

csy0225 added 2 commits October 24, 2023 14:58

static check fix

d86c4ce

remove use mutable_data func and use data func instead

9a3c539

csy0225 force-pushed the inference_xpu_support_int8 branch from f657129 to 9a3c539 Compare October 24, 2023 09:01

zhupengyang reviewed Oct 24, 2023

View reviewed changes

csy0225 force-pushed the inference_xpu_support_int8 branch 2 times, most recently from 0385c60 to 4cce3dc Compare October 24, 2023 11:31

remove old prepare weight func

7c9255e

csy0225 force-pushed the inference_xpu_support_int8 branch from 4cce3dc to 7c9255e Compare October 24, 2023 12:27

yuanlehome previously approved these changes Oct 25, 2023

View reviewed changes

csy0225 dismissed yuanlehome’s stale review via 7348be9 October 25, 2023 07:15

move dequantize/quantize ops yaml pos

426c36b

csy0225 force-pushed the inference_xpu_support_int8 branch from 7348be9 to 426c36b Compare October 25, 2023 07:20

yuanlehome approved these changes Oct 25, 2023

View reviewed changes

heavengate approved these changes Oct 25, 2023

View reviewed changes

qili93 approved these changes Oct 25, 2023

View reviewed changes

raindrops2sea approved these changes Oct 25, 2023

View reviewed changes

zhupengyang merged commit 57a14e2 into PaddlePaddle:develop Oct 26, 2023

danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023

[XPU] XPU inference support int8 (PaddlePaddle#57258)

cdac7f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] XPU inference support int8 #57258

[XPU] XPU inference support int8 #57258

csy0225 commented Sep 13, 2023 •

edited

Loading

paddle-bot bot commented Sep 13, 2023

paddle-bot bot commented Sep 13, 2023

paddle-ci-bot bot commented Sep 28, 2023

paddle-ci-bot bot commented Oct 17, 2023

zhupengyang Oct 24, 2023

csy0225 Oct 24, 2023

zhupengyang Oct 24, 2023

zhupengyang Oct 24, 2023

csy0225 Oct 24, 2023

zhupengyang Oct 24, 2023

csy0225 Oct 24, 2023

zhupengyang Oct 24, 2023

csy0225 Oct 24, 2023

qili93 left a comment

		const paddle::optional<DenseTensor>& scale_max,
		const paddle::optional<DenseTensor>& out_max_in,

[XPU] XPU inference support int8 #57258

[XPU] XPU inference support int8 #57258

Conversation

csy0225 commented Sep 13, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Sep 13, 2023

paddle-bot bot commented Sep 13, 2023

paddle-ci-bot bot commented Sep 28, 2023

paddle-ci-bot bot commented Oct 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qili93 left a comment

Choose a reason for hiding this comment

csy0225 commented Sep 13, 2023 •

edited

Loading