The code in this repository fine-tunes a Llama 2 model on a 1000-sample subset of the Databricks Dolly 15k instruction dataset using Supervised Fine-Tuning (SFT) with QLoRA 4-bit precision.
-
Clone this repository:
git clone https://github.com/golkir/llama2-7b-minidatabricks.git cd llama2-7b-minidatabricks
-
Install dependencies
pip install .
-
Run the dataset subset creation script which fetches the Dolly 15k dataset and processes it in Llama 2 instruction format.
python load-databricks.py
-
Run the fine-tuning script:
python finetuning.py
- The Dolly 15k dataset is originally provided by Databricks. Link to Databricks Dolly 15k dataset.
- The Llama 2 model can be found in HuggingFace repository.
This code is licensed under the Apache 2.0 License.