Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
configs.h		configs.h
fp6_linear.cu		fp6_linear.cu
kernel_matmul.cuh		kernel_matmul.cuh
kernel_reduction.cuh		kernel_reduction.cuh
ptx_cp.async.cuh		ptx_cp.async.cuh
ptx_mma.cuh		ptx_mma.cuh
utils_core.cuh		utils_core.cuh
utils_gmem.cuh		utils_gmem.cuh
utils_parallel_dequant.cuh		utils_parallel_dequant.cuh

README.md

FP6-LLM kernel

This kernel is adapted from https://github.com/usyd-fsalab/fp6_llm. It performs linear op (A @ W.T), where A is in FP16 and W is in FP6 (E3M2 without infinities and NaN).

On most hardware, this kernel is faster than FP16 linear for batch size from 1 to 128, and slower for batch size larger than or equal to 256. See usyd-fsalab/fp6_llm#8 for a detailed discussion.

See pytorch#223 for some benchmark results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp6_llm

fp6_llm

README.md

FP6-LLM kernel

Files

fp6_llm

Directory actions

More options

Directory actions

More options

Latest commit

History

fp6_llm

Folders and files

parent directory

README.md

FP6-LLM kernel