Skip to content

Commit

Permalink
add note about FP6 kernel
Browse files Browse the repository at this point in the history
  • Loading branch information
gau-nernst committed May 14, 2024
1 parent 7d3a5b1 commit 08a95ac
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions torchao/csrc/fp6_llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# FP6-LLM kernel

This kernel is adapted from https://github.com/usyd-fsalab/fp6_llm. It performs linear op (A @ W.T), where A is in FP16 and W is in FP6 (E3M2 without infinities and NaN).

On most hardware, this kernel is faster than FP16 linear for batch size from 1 to 128, and slower for batch size larger than or equal to 256. See https://github.com/usyd-fsalab/fp6_llm/issues/8 for a detailed discussion.

See https://github.com/pytorch/ao/pull/223 for some benchmark results.

0 comments on commit 08a95ac

Please sign in to comment.