Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mgoin authored Jun 19, 2024
1 parent 0d40b99 commit 64ed1c6
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
# AutoFP8

Open-source FP8 quantization library for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for details on the implementation for inference.
Open-source FP8 quantization library for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for details on the implementation for inference. This library focuses on providing quantized weight, activation, and kv cache scales for FP8_E4M3 precision.

[FP8 Model Collection from Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) with many accurate (<1% accuracy drop) FP8 checkpoints ready for inference with vLLM.

> NOTE: AutoFP8 is in early beta and subject to change
<p align="center">
<img src="https://github.com/neuralmagic/AutoFP8/assets/3195154/c6bb9ddb-1bc9-48df-bf5f-9d7916dbd1f9" width="40%" />
<img src="https://github.com/neuralmagic/AutoFP8/assets/3195154/2e30d4c0-340a-4527-8ff7-e8d48a8807ca" width="40%" />
</p>

## Installation

Clone this repo and install it from source:
Expand Down

0 comments on commit 64ed1c6

Please sign in to comment.