-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more options in choose_qparams_affine for tinygemm op #227
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/227
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a0458db with merge base f6d56ca (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to compare with the existing numerics of choosing qparams and q/dq. If we're introducing error for no benefit then i don't thing we should deprecate the other primitives. At the very least we need to discuss the functional changes first.
also need to test whether these primitives also work with gptq, there are nuances that i'm not sure are being captured.
sounds good, updated the op to work with tinygemm numerics. for gptq and other types of e2e quant tests, we could establish a dashboard for perf and accuracy first, we can prioritize this in 0.3 |
8d76b57
to
40d5a75
Compare
updated the op to close the accuracy gap, please take a look again
(scale_ao, _) = get_group_qparams_symmetric(weight, n_bit, groupsize) | ||
torch.testing.assert_allclose(scale_obs, scale_ao, rtol=0, atol=0) | ||
(scale_ao, _) = get_group_qparams_symmetric(weight, n_bit, groupsize, precision=torch.float16) | ||
torch.testing.assert_close(scale_obs, scale_ao, rtol=0, atol=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean assert_close? I saw assert_allclose is deprecated, that's why I updated them
target_dtype (torch.dtype): dtype for target quantized Tensor | ||
quant_min (Optional[int]): minimum quantized value for target quantized Tensor | ||
quant_max (Optioanl[int]): maximum quantized value for target quantized Tensor | ||
eps (Optional[float]): minimum scale, if not provided, default to eps of input.dtype | ||
scale_dtype (torch.dtype): dtype for scale Tensor | ||
zero_point_dtype (torch.dtype): dtype for zero_point Tensor | ||
_is_zero_exactly_representable (bool): a private flag to indicate whether we need zero to be exactly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'd expect symmetric without zero_point to imply this. Is that true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is talking about whether floating point value 0 is exactly representable by a quantized value or not, it's not related to symmetric/asymmetric quant. this is assumed by most of the existing quantized kernels
target_dtype (torch.dtype): dtype for target quantized Tensor | ||
quant_min (Optional[int]): minimum quantized value for target quantized Tensor | ||
quant_max (Optioanl[int]): maximum quantized value for target quantized Tensor | ||
eps (Optional[float]): minimum scale, if not provided, default to eps of input.dtype | ||
scale_dtype (torch.dtype): dtype for scale Tensor | ||
zero_point_dtype (torch.dtype): dtype for zero_point Tensor | ||
is_exact_zero (bool): a flag to indicate whether we need zero to be exactly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you also define what "zero being exactly representable" means for the outputs? You can add zero_padding
as an example of when this is useful. Also I'd remove the is_
part and choose preserve_zero
or some other verb. Since this is a kwarg to a function and not a property of a class it seems more consistent.
Summary: This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops Test Plan: python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams Reviewers: Subscribers: Tasks: Tags:
Summary: This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops Test Plan: python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams Reviewers: Subscribers: Tasks: Tags:
…ts with torchchat source code (pytorch#227)
Summary:
This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops
tinygemm choose_qparams op (and also quantize/dequantize op): https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_primitives.py#L36 is different from the other choose_qparams op in that it does not enforce zero_point to be exactly representable, meaning the floating point value 0 can't be exactly represented by an integer value in quantized tensor. This PR adds a flag to produce a zero_point value that can be adapted to be used by tinygemm kernels.
Test Plan:
python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams
Reviewers:
Subscribers:
Tasks:
Tags: