From 64ed1c67e846d8073b07fd813e72944b25ca45ba Mon Sep 17 00:00:00 2001 From: Michael Goin Date: Wed, 19 Jun 2024 07:38:16 -0600 Subject: [PATCH] Update README.md --- README.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 34d3684..ca50d5c 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,16 @@ # AutoFP8 -Open-source FP8 quantization library for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for details on the implementation for inference. +Open-source FP8 quantization library for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for details on the implementation for inference. This library focuses on providing quantized weight, activation, and kv cache scales for FP8_E4M3 precision. + +[FP8 Model Collection from Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) with many accurate (<1% accuracy drop) FP8 checkpoints ready for inference with vLLM. > NOTE: AutoFP8 is in early beta and subject to change +

+ + +

+ ## Installation Clone this repo and install it from source: