-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wint8 gemm and gemv opt #59291
Wint8 gemm and gemv opt #59291
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
74589d4
to
1402d7c
Compare
私以为,升级点在PR里可能应该更详细一点~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Genius!
@@ -5163,7 +5163,7 @@ void WeightQuantizeInferMeta(const MetaTensor& x, | |||
out->set_dtype(DataType::INT8); | |||
|
|||
scale->set_dims(phi::make_ddim(dim_scale)); | |||
scale->set_dtype(DataType::FLOAT32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以后scale都是fp16了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的, 这边scale改为了bf16/fp16, 能够有更好的性能, 精度应该也能保证, 已经同步修改了weight_quant op中计算scale的部分, 以及paddlenlp代码中scale权重初始化的逻辑
已添加了升级点和测试数据~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for const_cast
496a120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Description
Pcard-71501
This PR optimize the speed of weight only gemm and gemv gpu kernel.
To speed up the weight-only gemm, following features were adopted
For gemms with problem sizes in llama13b, we obtain a 1.34x gemm kernel speed in A100 80G.