-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU] XPU inference support int8 #57258
[XPU] XPU inference support int8 #57258
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
❌ The PR is not created using PR's template. You can refer to this Demo. |
9e04e30
to
aee3544
Compare
aee3544
to
26e125d
Compare
Sorry to inform you that 9483e72's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
Sorry to inform you that 3ab34c6's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
5712e97
to
da5cb07
Compare
958c7bb
to
c5ec5d9
Compare
c5ec5d9
to
5fea223
Compare
const paddle::optional<DenseTensor>& scale_max, | ||
const paddle::optional<DenseTensor>& out_max_in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale_max -> w_max_per_channel,这样命名是不是更容易理解一些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个和 xdnn 的 conv2d_fusion api 定义是一致的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xdnn 的 conv2d_fusion api 定义就是比较难理解,不建议和这个对齐
f657129
to
9a3c539
Compare
template <typename Tcpu, typename Txpu> | ||
void PrepareWeight(Graph* graph, | ||
Scope* scope, | ||
BlockDesc* block, | ||
Node* weight, | ||
Node** quant_weight, | ||
Node** quant_weight_max, | ||
bool transpose, | ||
const std::vector<float>& weight_scales); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
两个 PrepareWeight 是否可以合并到
template <typename Tcpu, typename Txpu=int16_t>
void PrepareWeight(Graph* graph,
Scope* scope,
BlockDesc* block,
Node* src_w,
Node** dst_w,
Node** dst_w_max,
bool transpose,
const std::vector<float>& w_max={});
- 后续有 float->float 做 int31 计算的需求,不带 quant 命名会更通用一些
- 模型中带的 scale 可能实际含义是 max,这样的命名不太好,建议写代码的时候按照实际含义命名
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改。
if (!weight_scales.empty()) { | ||
LOG(FATAL) << "Weight scales should be empty(), otherwise, check if your " | ||
"model is quant model or not."; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PADDLE_ENFORCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
template < | ||
typename Tcpu, | ||
typename Txpu, | ||
typename std::enable_if<std::is_same<Tcpu, float>::value, Tcpu>::type* ptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float16 也需要支持吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float16暂时没有这个需求,float16内部是转成 float32,在进行量化的。今后有需要可以在加
0385c60
to
4cce3dc
Compare
4cce3dc
to
7c9255e
Compare
7348be9
to
426c36b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for const_cast
PR types
New features
PR changes
Others
Description
Paddle-Inference xpu 后端增加量化推理的能力,支持 Paddle-Slim onnx 格式量化模型的加载推理, 支持 conv、fc 的 int8 实现。