Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparsify API to torchao #473

Merged
merged 2 commits into from
Jul 5, 2024
Merged

Add sparsify API to torchao #473

merged 2 commits into from
Jul 5, 2024

Conversation

jcaip
Copy link
Contributor

@jcaip jcaip commented Jul 3, 2024

This PR removes the old apply_sparse_semi_structured API in favor of sparsify, so that we are more inline with the existing quantize API.

I also updated the README so that it's a bit more clear that sparsity doesn't require intrusive code changes and added a bit of example code.

Copy link

pytorch-bot bot commented Jul 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/473

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending

As of commit 950c476 with merge base a895699 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2024
In some cases we rewrote popular GenAI models to be significantly faster in native PyTorch as in no C++/CUDA to achieve at the time SOTA inference performance. These involve more intrusive code changes.
```python
from torchao.sparsity import sparsify
from torchao.sparsity.prototype.dynamic_quant_sparse import int8_dynamic_activation_int8_2x4_sparse_weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the plan to move this out of prototype? is this related to composing sparsity and quant properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I plan to move this out as part of 0.4, implementing a layout like here: https://github.com/pytorch/ao/compare/jcaip/affine-quantize-sparse?expand=1

This works, but I am running into a performance regression so need to debug that first before we can merge.


#### With intrusive code changes
m = sparsify(m, to_sparse_semi_structured)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this is using to_sparse_semi_structured while the other one is using int8_dynamic_activation_int8_2x4_sparse_weight() which might be a bit confusing, I'd suggest to just align

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK ill add a semi_sparse_weight() wrapper function.

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Minor questions above

@@ -37,7 +38,7 @@ def test_sparse(self):
apply_fake_sparsity(model)
dense_result = model(input)

apply_sparse_semi_structured(model)
model = sparsify(model, to_sparse_semi_structured)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is sparsify an in place op? This came up recently since quantize is in place and here it looks like the api used to be in place but now its not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there's some discussion on whether we can use quantize_ or quantize for the in-place op, I'm not sure if we came to a conclusion. cc @jerryzh168 do you have a preference for what to use here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use the inplace version for now I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcaip I think we can change this to sparsify_ to be consistent with quantization: #467

In some cases we rewrote popular GenAI models to be significantly faster in native PyTorch as in no C++/CUDA to achieve at the time SOTA inference performance. These involve more intrusive code changes.
```python
from torchao.sparsity import sparsify
from torchao.sparsity.prototype.dynamic_quant_sparse import int8_dynamic_activation_int8_2x4_sparse_weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have briefly chatted about this when we were discussing the quantize api but just thinking out loud here

If I add quantize(sparsify(m)) is that different from sparsify(quantize(m)) and if so in what order if any are optimizations applied in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the composition of int8 quantization and 2:4 sparsity is treated as it's own distinct technique, so you can either go:

quantize(int8dynamic + 2:4 sparse) or sparsify(int8dynamic + 2:4 sparse). Once we implement sparsity as a AQTLayout we can add support for a "composable" API, where we go

quantize(int8 dynamic)
sparsify(to_sparse_semi_structured)

or vice versa.

Mathematically, the how you apply the optimizations will matter, but I think we should make them the so it doesn't matter for our API, for two reasons:

  1. currently only quantize -> sparsify is supported, it would be extra work to support sparsify -> quantize.
  2. One of these orderings will be "better", I can't really see a situation where the order of how you apply the optimizations will differ across different layers. So we should always just default to the "best" one and not give users an option to shoot themselves in their foot.

m = sparsify(m, to_sparse_semi_structured, filter_fn)

# for int8 dynamic quantization + 2:4 sparsity
from torchao.sparity.prototype import int8_dynamic_activation_int8_2x4_sparse_weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here sparity

mod.weight = torch.nn.Parameter(to_sparse_semi_structured(mod.weight))
Currently, we support two options for sparsity:
- semi-sturctured (2:4) sparsity with `to_sparse_semi_structured`
- int8 dynamic quantization + 2:4 sparsity with `int8_dynamic_activation_int8_2x4_sparse_weight`, which is also available via the quantize API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where I'm a bi confused, it's not clear whether the quantize and sparsify apis compose reading the docstrings

if filter_fn(mod, name):
mod.weight = torch.nn.Parameter(to_sparse_semi_structured(mod.weight))
Currently, we support two options for sparsity:
- semi-sturctured (2:4) sparsity with `to_sparse_semi_structured`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here as well sturctured

@msaroufim msaroufim self-requested a review July 4, 2024 03:17
@msaroufim
Copy link
Member

The failure in nightly regression is a flake with smoothquant where it tries to download something from HF, so this is safe to merge

@jcaip jcaip merged commit a35a1cd into main Jul 5, 2024
13 checks passed
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
* Add sparsify API to torchao

* fix typo
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants