Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPU] XPU inference support int8 #57258

Merged
merged 16 commits into from
Oct 26, 2023

Conversation

csy0225
Copy link
Contributor

@csy0225 csy0225 commented Sep 13, 2023

PR types

New features

PR changes

Others

Description

Paddle-Inference xpu 后端增加量化推理的能力,支持 Paddle-Slim onnx 格式量化模型的加载推理, 支持 conv、fc 的 int8 实现。

@paddle-bot
Copy link

paddle-bot bot commented Sep 13, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot
Copy link

paddle-bot bot commented Sep 13, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch 2 times, most recently from 9e04e30 to aee3544 Compare September 19, 2023 08:25
@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch from aee3544 to 26e125d Compare September 19, 2023 09:14
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Sep 28, 2023

Sorry to inform you that 9483e72's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Oct 17, 2023

Sorry to inform you that 3ab34c6's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch from 5712e97 to da5cb07 Compare October 20, 2023 02:02
@csy0225 csy0225 changed the title Inference xpu support int8 [XPU] XPU inference support int8 Oct 23, 2023
@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch 3 times, most recently from 958c7bb to c5ec5d9 Compare October 23, 2023 11:41
@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch from c5ec5d9 to 5fea223 Compare October 23, 2023 11:54
Comment on lines +33 to +34
const paddle::optional<DenseTensor>& scale_max,
const paddle::optional<DenseTensor>& out_max_in,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale_max -> w_max_per_channel,这样命名是不是更容易理解一些

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个和 xdnn 的 conv2d_fusion api 定义是一致的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xdnn 的 conv2d_fusion api 定义就是比较难理解,不建议和这个对齐

@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch from f657129 to 9a3c539 Compare October 24, 2023 09:01
Comment on lines 91 to 99
template <typename Tcpu, typename Txpu>
void PrepareWeight(Graph* graph,
Scope* scope,
BlockDesc* block,
Node* weight,
Node** quant_weight,
Node** quant_weight_max,
bool transpose,
const std::vector<float>& weight_scales);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

两个 PrepareWeight 是否可以合并到

template <typename Tcpu, typename Txpu=int16_t>
void PrepareWeight(Graph* graph,
                   Scope* scope,
                   BlockDesc* block,
                   Node* src_w,
                   Node** dst_w,
                   Node** dst_w_max,
                   bool transpose,
                   const std::vector<float>& w_max={});
  • 后续有 float->float 做 int31 计算的需求,不带 quant 命名会更通用一些
  • 模型中带的 scale 可能实际含义是 max,这样的命名不太好,建议写代码的时候按照实际含义命名

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

Comment on lines 284 to 287
if (!weight_scales.empty()) {
LOG(FATAL) << "Weight scales should be empty(), otherwise, check if your "
"model is quant model or not.";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PADDLE_ENFORCE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

Comment on lines +276 to +279
template <
typename Tcpu,
typename Txpu,
typename std::enable_if<std::is_same<Tcpu, float>::value, Tcpu>::type* ptr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float16 也需要支持吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float16暂时没有这个需求,float16内部是转成 float32,在进行量化的。今后有需要可以在加

@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch 2 times, most recently from 0385c60 to 4cce3dc Compare October 24, 2023 11:31
@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch from 4cce3dc to 7c9255e Compare October 24, 2023 12:27
yuanlehome
yuanlehome previously approved these changes Oct 25, 2023
@csy0225 csy0225 force-pushed the inference_xpu_support_int8 branch from 7348be9 to 426c36b Compare October 25, 2023 07:20
Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for const_cast

@zhupengyang zhupengyang merged commit 57a14e2 into PaddlePaddle:develop Oct 26, 2023
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants