Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantize_activation_per_token_absmax
use general quant primitives #193quantize_activation_per_token_absmax
use general quant primitives #193Changes from all commits
0f3db8c
5759dcd
5686433
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuzo @HDCharles do you know if this clampping before the scale is calculated is required for
quantize_activation_per_token_absmax
to work for our use cases?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need an epsilon to handle the case where max(abs(x)) is zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so can we do this after we divide by q_max? that's where we are doing the clamp typically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have this logic in
choose_qparams_affine
, I would imagine it's for the same purpose. I would recommend:max(abs(x)) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuzo Is
eps
best chosen to be (a multiple of / the) machine epsilon of the input tensor's dtype? Or is it a parameter that needs to be searched depending on input data distribution?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll add a test. will think about if we want to make it required, we didn't do clamping in some of the ops right now I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the case of
max(abs(x)) == 0
should be handled for every opI think that choosing eps based on the dtype makes sense, it should not be data dependent. My educated guess of how to choose this number would be "smallest number so that the resulting scale is finite", I haven't looked into whether expressing that in terms of machine epsilon would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after a second thought I guess it makes sense to have an eps, otherwise we may have divide by zero during quantization, I'll change it in a separate PR
for eps, I think we can do torch.finfo(input_dtype).eps if it's not provided from user