Skip to content

srikrish2812/quantize_open_source_llms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Quantization of Hugging Face models using llama.cpp

0. Install make, gcc and git-lfs in the system

  1. Create a directory named os_model and move to that directory.
mkdir os_model
cd os_model
  1. Create a virtual environment named osenv(do it according to your OS).
  python -m venv osenv
  source osenv/bin/activate
  1. Install Git Large File System(lfs) and clone the sqlcoder-34b repo from hugging face. This takes a lot of time because the remote repo is nearly 140GB. Be patient and do not cancel the download.
git lfs install
git clone https://huggingface.co/defog/sqlcoder-34b-alpha
  1. Download only tokenizer.model file from the following repo https://huggingface.co/defog/sqlcoder-7b/tree/main and place in the sqlcoder-34b-alpha folder.

  2. Clone llama.cpp and setup python conversion stuff

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements.txt
  1. Create a directory called sqlcoder-34b-alpha in the models directory located in llama.cpp folder

  2. Run the following commands

make
python3 convert.py <model_folder../os_model/sqlcoder-34b-alpha> --outfile ./models/sqlcoder-34b-alpha/ggml-sqlcoder-34b-f16.gguf --outtype f16
  1. Now let's quantize the above generated model to q4_k. You can quantize it to q8_0 for better performance(to do it replace q4_k with q8_0 in the below code).
./quantize ./models/sqlcoder-34b-alpha/ggml-sqlcoder-34b-f16.gguf ./models/sqlcoder-34b-alpha/ggml-sqlcoder-34b-q4_k.gguf.bin q4_k
  1. We can use this model for various text-generation tasks. The model downloaded performs well for the task of natural language to SQL query generation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages