From 08a95ac5767a18c6ddc22fb9aa0f02ee4b95e52f Mon Sep 17 00:00:00 2001 From: Thien Tran Date: Tue, 14 May 2024 01:10:24 +0000 Subject: [PATCH] add note about FP6 kernel --- torchao/csrc/fp6_llm/README.md | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 torchao/csrc/fp6_llm/README.md diff --git a/torchao/csrc/fp6_llm/README.md b/torchao/csrc/fp6_llm/README.md new file mode 100644 index 000000000..ff764cc27 --- /dev/null +++ b/torchao/csrc/fp6_llm/README.md @@ -0,0 +1,7 @@ +# FP6-LLM kernel + +This kernel is adapted from https://github.com/usyd-fsalab/fp6_llm. It performs linear op (A @ W.T), where A is in FP16 and W is in FP6 (E3M2 without infinities and NaN). + +On most hardware, this kernel is faster than FP16 linear for batch size from 1 to 128, and slower for batch size larger than or equal to 256. See https://github.com/usyd-fsalab/fp6_llm/issues/8 for a detailed discussion. + +See https://github.com/pytorch/ao/pull/223 for some benchmark results.