Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for serialization/deserialization of torchao optimized models #524

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

jerryzh168
Copy link
Contributor

Summary:
Addressing following questions:

  1. What happens if I save a quantized model
  2. What happens if I load a quantized model and describing deteails like assign=True

Specifically

  1. Do you need ao as a dependency when you're loading a quantized model
  2. Is the saved quantized model smaller on disk than the unquantized one

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Jul 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/524

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4eec577 with merge base 6dd82d8 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2024
@jerryzh168 jerryzh168 requested a review from msaroufim July 17, 2024 21:49
docs/source/ser_deser.rst Outdated Show resolved Hide resolved
docs/source/ser_deser.rst Outdated Show resolved Hide resolved
docs/source/ser_deser.rst Outdated Show resolved Hide resolved
docs/source/ser_deser.rst Outdated Show resolved Hide resolved
docs/source/ser_deser.rst Outdated Show resolved Hide resolved
docs/source/ser_deser.rst Outdated Show resolved Hide resolved

What happens when serializing an optimized model?
=================================================
To serilize an optimized model, we just need to call `torch.save(m.state_dict(), f)`, because in torchao, we use tensor subclass to represent different dtypes or support different optimization techniques like quantization and sparsity. So after optimization, the only thing that is updated is the weight Tensor and the model structure is not changed at all. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subclass point is not well explained, I think what you're trying to say more plainly is at model save/load time we swap in the quantized weights

Which means we're instantiating the full precision model, which means we probably also want to explain why people might want to instantiate a model on cpu to later transfer to gpu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we recommend people to initialize a model in meta device, this is explained in the deserialization section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the example help with the explanation? otherwise let me know what else I can change

Summary:
Addressing following questions:

1. What happens if I save a quantized model
2. What happens if I load a quantized model and describing deteails like assign=True

Specifically
1. Do you need ao as a dependency when you're loading a quantized model
2. Is the saved quantized model smaller on disk than the unquantized one

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@msaroufim msaroufim merged commit 891a588 into pytorch:main Jul 18, 2024
13 checks passed
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
…pytorch#524)

Summary:
Addressing following questions:

1. What happens if I save a quantized model
2. What happens if I load a quantized model and describing deteails like assign=True

Specifically
1. Do you need ao as a dependency when you're loading a quantized model
2. Is the saved quantized model smaller on disk than the unquantized one

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants