Refactor tensor subclass API to also use paramterization #146

jerryzh168 · 2024-04-17T23:18:25Z

Summary:
Also added tests for tensor subclass api + AOTI compilation

Test Plan:
python test/integration/test_integration.py -k test_aoti

Two issues right now:

AOTI test: change_linear_weights_to_int8_dqtensors + cuda device doesn't work
another AOTI test also failed with cuda device (only in CI), will create another PR to repro later

Reviewers:

Subscribers:

Tasks:

Tags:

torchao/quantization/quant_api.py

.github/workflows/regression_test.yml

torchao/quantization/quant_api.py

cpuhrsch · 2024-04-30T17:36:39Z

torchao/quantization/quant_primitives.py

@@ -493,7 +493,7 @@ def quant_int8_dynamic_per_token_linear(
        x_vals_int8, x_scales, w_vals_int8_t, w_scales, out_dtype
    )
    if bias is not None:
-        mm_out += bias
+        mm_out = mm_out + bias


Why is this needed?

there is some issue with this in AOT Inductor I think. cc @desertfire

I think @cpuhrsch 's question is why rewriting "+=". I am not aware any AOTI restriction that needs this rewrite.

@desertfire the error I'm getting with "+=" is this: https://gist.github.com/jerryzh168/d4ea2fb8138376cff903c38aaef8f5ef, is this expected?

just menat to review the yaml change

cpuhrsch · 2024-04-30T23:33:27Z

torchao/quantization/quant_primitives.py

    ).reshape(w.shape[0], -1)


 def pack_tinygemm_scales_and_zeros(scales, zeros):
    assert scales.shape == zeros.shape
-    assert scales.dtype == torch.bfloat16
-    assert zeros.dtype == torch.bfloat16
+    assert scales.dtype == torch.bfloat16, f" got dtype: {scales.dtype}"


Will this also show what dtype was expected? It seems like an opportunity for a dtype guard decorator or somesuch

def guard_dtype_size(tensor_arg, arg_name, dtype=None, size=None): if dtype is not None and tensor_arg.dtype != dtype: raise ValueError("Expected Tensor argument {arg_name} to have dtype {dtype}, but got {tensor_arg.dtype} instead.") if size is not None and tensor_arg.size() != size: raise ValueError("Expected Tensor argument {arg_name} to have dtype {dtype}, but got {tensor_arg.dtype} instead.") guard_dtype_size(scales, "scales", torch.bfloat16, zeros.size()) guard_dtype_size(zeros, "zeros", torch.bfloat16)

See ValueError reference manual for why I chose ValueError here.

cpuhrsch · 2024-04-30T23:34:08Z

torchao/quantization/subclass.py

+        self.kwargs = kwargs
+
+    def forward(self, int_data, q_scales):
+        return from_qtensor_components_int8dyn(int_data, q_scales, *self.args, **self.kwargs)


Can you use cls.__tensor_flatten__(*args) for this?

you mean tensor unflatten? we can't use cls in forward because of pytorch/pytorch#124735 right now

If you wrap do

def create_parameterization_module(cls): class SubclassParameterization: [...] def forward(self, args): cls.[...](args) return SubclassParameterization

then cls is given as an argument to create_parameterization_module and you return an instance of SubclassParameterization where cls is that argument. Essentially a module factory function.

These methods also shouldn't be static.

isn't this using cls in forward? I tried this before, and with @torch._dynamo.allow_in_graph for the constructor function and it fails because we can't use class variable in dynamo right now I think.

are you suggesting something like this: 25abb31#diff-bf4d50867e3d649de2d89146592bf47d2f258c4c19126c8acf0e120ee904b726R134 (but using cls instead of hardcoding the class?

Yes exactly and using __tensor_unflatten__ instead of from_qtensor_components

for reference using cls in forward is not supported until pytorch/pytorch#123350 is landed, according to Brain

cpuhrsch · 2024-04-30T23:34:18Z

torchao/quantization/subclass.py

+        return from_qtensor_components_int8dyn(int_data, q_scales, *self.args, **self.kwargs)
+
+    def right_inverse(self, tensor_subclass_instance):
+        return tensor_subclass_instance.int_data, tensor_subclass_instance.q_scales


Can you use return self.__tensor_flatten__ for this?

this works, thanks. I'll create a parent class to host init and right_inverse

torchao/quantization/quant_api.py

jcaip · 2024-04-30T23:34:31Z

torchao/quantization/quant_api.py

+        if enable_parametrization:
+            lin.weight = torch.nn.Parameter(cls.from_float(lin.weight), requires_grad=False)
+            _, args = lin.weight.__tensor_flatten__()
+            parametrize.register_parametrization(lin, "weight", getattr(cls, constructor)(cls, *args))


noob question - why do we want to enable this parameterization support?

this is for supporting exporting the tensor subclass model, needed by aot_compile and also torch.export.export

Tensor subclasses don't work with AOTI

cpuhrsch · 2024-04-30T23:36:23Z

torchao/quantization/subclass.py

+        **kwargs,
+    )
+
+class ConstructTensorSubclassInt8Dyn(torch.nn.Module):


Can this made generic for all tensor subclasses?

can't do it now because of pytorch/pytorch#124735, should be able to do it after this is fixed

jerryzh168 · 2024-05-01T18:31:37Z

test/integration/test_integration.py

    def wrapper(*args, **kwargs):
-        if args[2] == "cuda" and not torch.cuda.is_available():
+        assert len(args) >= 3, f"Not enough args. Expected more than or equal to 3, but got {len(args)}"


btw @cpuhrsch we need to use checks + skip test here I think, otherwise this test would fail:
FAIL: test_aoti (main.TestAOTI)

albanD · 2024-05-01T19:00:40Z

torchao/quantization/quant_api.py

@@ -141,11 +149,11 @@ def change_linear_weights_to_int8_dqtensors(model, filter_fn=None):
        )

    _replace_with_custom_fn_if_matches_filter(
-        model, _get_subclass_inserter(Int8DynamicallyQuantizedLinearWeight), filter_fn
+        model, _get_subclass_inserter(Int8DynamicallyQuantizedLinearWeight, enable_parametrization=TORCH_VERSION_AFTER_2_4, **kwargs), filter_fn


I would expect we use parametrization only for AOTI? As some kind of "pre-processing" there.
Especially given that my understanding of the long term plan is that AOTI will do this pre-processing themselves and we wll be able to remove it from there.

we also need this for torch.export (used by executorch), I'll add a test in next PR, also we want to have a consistent code path for all backends/runtimes I think. is there any problems with enabling this for all use cases?

@albanD do you think that long term we want export to do the pre-processing?

I think if that's the case, then we might just want to figure out that story now (it might be less work than getting dynamo to handle parametrizations).

The main contentious bit is probably just where this pre-processing should live. One possible answer is that it should happen transparently as part of torch.export.export(): automatically search the created state dict for subclasses and flatten them (although this might be a problem if the user expects the state dict of the ExportedProgram to alias the original model's state dict)

Summary: Also added tests for tensor subclass api + AOTI compilation Test Plan: python test/integration/test_integration.py -k test_aoti Reviewers: Subscribers: Tasks: Tags:

* tiktoken integration, part 1 * update tests

jerryzh168 requested a review from cpuhrsch April 17, 2024 23:18

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2024

jerryzh168 requested a review from HDCharles April 17, 2024 23:18

jerryzh168 force-pushed the aoti_tests branch 4 times, most recently from a5b6dda to a9e5563 Compare April 19, 2024 01:30

albanD reviewed Apr 19, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the aoti_tests branch 2 times, most recently from d0b9c23 to 25abb31 Compare April 20, 2024 00:16

jerryzh168 changed the base branch from aoti_tests to main April 20, 2024 04:04

jerryzh168 force-pushed the aoti_tests branch 5 times, most recently from 578b4f0 to a906c53 Compare April 30, 2024 00:55

jerryzh168 requested review from albanD and bdhirsh April 30, 2024 00:56

jerryzh168 force-pushed the aoti_tests branch 4 times, most recently from 2efcc92 to c18e2f6 Compare April 30, 2024 02:21

jerryzh168 requested a review from msaroufim April 30, 2024 03:07

jerryzh168 commented Apr 30, 2024

View reviewed changes

.github/workflows/regression_test.yml Show resolved Hide resolved

msaroufim previously approved these changes Apr 30, 2024

View reviewed changes