Skip to content

seannz/cvxq

Repository files navigation

LLM Compression with Convex Optimization—Part 1: Weight Quantization

arXiv License:MIT

Screenshot 2024-09-04 at 9 15 57 AM

This is the official repo for the paper "Foundations of LLM Compression—Part 1: Weight Quantization".


Screenshot 2024-11-05 at 9 46 02 PM Screenshot 2024-11-05 at 9 45 50 PM Screenshot 2024-11-05 at 9 45 31 PM

Pre-requisites

All pre-requisite python packages are listed in pytorch_2.2.1.yml. Run conda env create -f pytorch_2.2.1.yml.

Quantizing Models

Run python setup_cvxq.py to compile the CVXQ matmul kernel.
Run scripts/opt_all.sh to quantize OPT models.
Run scripts/llama_all.sh to quantize Llama-2 models.