SKVQ

This is the official implementation of SKVQ.

SKVQ achieves extremely low-bit quantization of KV Cache with minimal accuracy drop by leveraging the locality of the attention module.

Usage

Environment:

conda create -n skvq python==3.10
pip install -r requirements.txt
# install cuda extension
cd kernels && pip install -e .

Calibration
```
python calibration --model [MODEL]
```

Test

PPL:
```
python eval_ppl.py --model [MODEL]
```

needle test

# SKVQ
python eval_needle.py \
    --model_name llama2-7b-80k \
    --quant k2-v2-w128-g128-reorder-pre_rope-clip-sink5 \
    --ctx_len 32000 \

# To reproduce KIVI
python eval_needle.py \
    --model_name llama2-7b-80k \
    --quant k2-v2-w128-g128-KIVI \
    --ctx_len 32000 \

longbench

# SKVQ
python eval_longbench.py \
    --model_name llama3-70b-instruct \
    --quant k2-v2-g128-w128-reorder-pre_rope-clip-sink5-fp8

# To reproduce KIVI
python eval_longbench.py \
    --model_name llama3-70b-instruct \
    --quant k2-v2-g128-w128-KIVI

⚠️Note

The results reported in the paper were obtained using fake quantization. We provide a naive fake quant cuda kernel to accelerate the experiment.
The current dequantization and gemv kernel is naive and inefficient ! We will release a much more efficient fused kernel soon.
We currently hard code group clipping into the evaluation script. To avoid overfitting, we now use unified clipping instead of group-wise clipping. You can tune it(according to our experience: between 0.91 and 0.97), which usually has a significant impact on PPL result.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
experiments		experiments
kernels		kernels
longbench_config		longbench_config
media		media
.gitignore		.gitignore
KV_process.py		KV_process.py
KVcache_manager.py		KVcache_manager.py
README.md		README.md
calib_config.py		calib_config.py
calibration.py		calibration.py
eval_longbench.py		eval_longbench.py
eval_needle.py		eval_needle.py
eval_ppl.py		eval_ppl.py
metrics.py		metrics.py
requirements.txt		requirements.txt
score_longbench.py		score_longbench.py
viz-needle.ipynb		viz-needle.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SKVQ

Usage

About

Releases

Packages

Languages

cat538/SKVQ

Folders and files

Latest commit

History

Repository files navigation

SKVQ

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages