Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md for float8 inference #896

Merged
merged 1 commit into from
Sep 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions torchao/quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ change_linear_weights_to_int4_woqtensors(model)

Note: The quantization error incurred by applying int4 quantization to your model can be fairly significant, so using external techniques like GPTQ may be necessary to obtain a usable model.

#### A16W8 WeightOnly Quantization
#### A16W8 Int8 WeightOnly Quantization

```python
# for torch 2.4+
Expand All @@ -109,7 +109,7 @@ from torchao.quantization.quant_api import change_linear_weights_to_int8_woqtens
change_linear_weights_to_int8_woqtensors(model)
```

#### A8W8 Dynamic Quantization
#### A8W8 Int8 Dynamic Quantization

```python
# for torch 2.4+
Expand All @@ -121,6 +121,22 @@ from torchao.quantization.quant_api import change_linear_weights_to_int8_dqtenso
change_linear_weights_to_int8_dqtensors(model)
```

#### A16W8 Float8 WeightOnly Quantization

```python
# for torch 2.5+
from torchao.quantization import quantize_, float8_weight_only
quantize_(model, float8_weight_only())
```

#### A16W8 Float8 Dynamic Quantization with Rowwise Scaling

```python
# for torch 2.5+
from torchao.quantization.quant_api import quantize_, PerRow, float8_dynamic_activation_float8_weight
quantize_(model, float8_dynamic_activation_float8_weight(granularity=PerRow()))
```

#### A16W6 Floating Point WeightOnly Quantization

```python
Expand Down
Loading