This is the official repo for the paper "Foundations of LLM Compression—Part 1: Weight Quantization".
All pre-requisite python packages are listed in pytorch_2.2.1.yml
. Run conda env create -f pytorch_2.2.1.yml
.
Run python setup_cvxq.py
to compile the CVXQ matmul kernel.
Run scripts/opt_all.sh
to quantize OPT models.
Run scripts/llama_all.sh
to quantize Llama-2 models.