LLM Compression with Convex Optimization—Part 1: Weight Quantization

This is the official repo for the paper "Foundations of LLM Compression—Part 1: Weight Quantization".

Pre-requisites

All pre-requisite python packages are listed in pytorch_2.2.1.yml. Run conda env create -f pytorch_2.2.1.yml.

Quantizing Models

Run python setup_cvxq.py to compile the CVXQ matmul kernel.
Run scripts/opt_all.sh to quantize OPT models.
Run scripts/llama_all.sh to quantize Llama-2 models.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
dataset		dataset
models/opt		models/opt
plots		plots
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bitpack.py		bitpack.py
cvxq_cuda.cpp		cvxq_cuda.cpp
cvxq_cuda_kernel.cu		cvxq_cuda_kernel.cu
macros.h		macros.h
modules.py		modules.py
opt_test.py		opt_test.py
opt_train.py		opt_train.py
optimizer.py		optimizer.py
options.py		options.py
pytorch_2.2.1.yml		pytorch_2.2.1.yml
quantizer.py		quantizer.py
setup_cvxq.py		setup_cvxq.py
test.py		test.py
test_cvxq_simple.py		test_cvxq_simple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Compression with Convex Optimization—Part 1: Weight Quantization

Pre-requisites

Quantizing Models

About

Languages

License

seannz/cvxq

Folders and files

Latest commit

History

Repository files navigation

LLM Compression with Convex Optimization—Part 1: Weight Quantization

Pre-requisites

Quantizing Models

About

Resources

License

Stars

Watchers

Forks

Languages