Llama 3.1 405B distillation using UC Berkeley's RAFT recipe on Azure AI Serverless

Generated using DALL-e 3 on Azure AI

This repository is a recipe that will walk you through doing LLM distillation on Azure AI Serverless.

Distillation is a process where a large pre-trained model (often referred to as the "teacher" model) is used to train a smaller, more efficient model (known as the "student" model). The goal is to transfer the knowledge from the teacher to the student, enabling the student to achieve comparable performance while being more resource-efficient.

This recipe uses Meta Llama 3.1 405B as a teacher model deployed on Azure AI to generate a synthetic dataset using UC Berkeley's Gorilla project RAFT method (see blog post). The synthetically generated dataset will then be used to finetune a student model, Meta Llama 3.1 8B or another supported model. Finally, we will deploy the fine-tuned model and evaluate its performance compared to a baseline model.

Project Goal: The primary objective of this project is to simplify and automate the process of distilling large language models. The workflows and notebooks are meant to be as hands-free as possible, ensuring that even complex tasks like generating synthetic datasets, fine-tuning models, and deploying them can be accomplished with minimal manual intervention. Whether you’re a beginner or an expert, our focus is on providing a seamless experience that allows you to focus on the results rather than the process.

More about RAFT

Microsoft/Meta Blog post: RAFT: A new way to teach LLMs to be better at RAG
Paper: RAFT: Adapting Language Model to Domain Specific RAG
UC Berkeley blog post: RAFT: Adapting Language Model to Domain Specific RAG
Meta blog post: RAFT: Sailing Llama towards better domain-specific RAG
Gorilla project home: Large Language Model Connected with Massive APIs
RAFT Github project

Getting started / Provisioning Azure AI infrastructure

The infrastructure for this project is fully provisioned using the Azure Developer CLI (AZD). AZD simplifies the deployment process by automating the setup of all required Azure resources, ensuring that you can get started with minimal configuration. This approach allows you to focus on the core aspects of model distillation and fine-tuning, while AZD handles the complexities of cloud resource management behind the scenes. By leveraging AZD, the project maintains a consistent and reproducible environment, making it easier to collaborate and scale.

The easiest is to open the project in Codespaces (or in VS Code Dev Container locally). It comes with azd included.

Login using azd

azd auth login --use-device-code

Provision the infrastructure

azd up

When asked about the region, enter westus3, it is currently the only region supported for Model As A Service Serverless deployment.

The post provisioning tests.sh script will run infra integration tests to make sure everything is deployed successfully.

Another post provisioning script, export_env.sh will export the environment variables for the provisioned infrastructure to the generated ./.env.state file.

Bring you own models

The easiest is to provision the infrastructure using azd but you can of course also bring your own models. Just provide environment variables for endpoints of your models in the ./.env manual env file at the root of the project.

Environment variable configuration

Those environment variables are expected by RAFT cli scripts. They are suffixed by the purpose of the model COMPLETION, EMBEDDING, BASELINE followed by either standard OpenAI or Azure OpenAI variable names.

Choose for each model purpose either one of the following API styles:

OpenAI API

Env var name	Explanation
`COMPLETION_OPENAI_API_KEY`	API Key for the teacher model
`COMPLETION_OPENAI_BASE_URL`	Base URL for the teacher model
`COMPLETION_OPENAI_DEPLOYMENT`	Deployment name for the teacher model
`EMBEDDING_OPENAI_API_KEY`	API Key for the embedding model
`EMBEDDING_OPENAI_BASE_URL`	Base URL for the embedding model
`EMBEDDING_OPENAI_DEPLOYMENT`	Deployment name for the embedding model
`BASELINE_OPENAI_API_KEY`	API Key for the baseline model
`BASELINE_OPENAI_BASE_URL`	Base URL for the baseline model
`BASELINE_OPENAI_DEPLOYMENT`	Deployment name for the baseline model

Azure OpenAI API

Env var name	Explanation
`COMPLETION_AZURE_OPENAI_API_KEY`	API Key for the teacher model
`COMPLETION_AZURE_OPENAI_ENDPOINT`	Endpoint for the teacher model
`COMPLETION_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the teacher model
`COMPLETION_OPENAI_API_VERSION`	API Version for the teacher model
`EMBEDDING_AZURE_OPENAI_API_KEY`	API Key for the embedding model
`EMBEDDING_AZURE_OPENAI_ENDPOINT`	Endpoint for the embedding model
`EMBEDDING_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the embedding model
`EMBEDDING_OPENAI_API_VERSION`	API Version for the embedding model
`BASELINE_AZURE_OPENAI_API_KEY`	API Key for the baseline model
`BASELINE_AZURE_OPENAI_ENDPOINT`	Endpoint for the baseline model
`BASELINE_AZURE_OPENAI_DEPLOYMENT`	Deployment name for the baseline model
`BASELINE_OPENAI_API_VERSION`	API Version for the baseline model

Notebooks

This repository is organized in 4 notebooks, one for each step of the process:

Notebook	Explanation
1_gen.ipynb	Generate a finetuning dataset using RAFT
2_finetune.ipynb	Fine tune a base model using the generated dataset
3_deploy.ipynb	Deploy the fine tuned model
4_eval.ipynb	Evaluate the fine tuned model

Run time and costs

Warning: The times and costs mentioned bellow are indications to give you a sense of what to expect but can vary dramatically depending on your experience, please monitor your usage to avoid surprises.

Notebook	Run time	Cost
1_gen.ipynb	From 5 minutes for the sample to multiple days for bigger domains	From $1 for the sample to $50 or more for bigger domains
2_finetune.ipynb	Roughly 1.5 hours	Roughly $50
3_deploy.ipynb	< 10 minutes	< $1
4_eval.ipynb	From 5 minutes for the sample to multiple days for bigger domains	From $1 for the sample to $50 or more for bigger domains

Dormant infrastructure costs

While not used, the infrastructure of this project won't cost much but will still cost a bit.

TODO: provide costs estimations for dormant infra

Configuration files

File	Explanation
.env	User provided environment variables read by notebooks and scripts
.env.state	Environment variables for resources created during notebooks execution and shared by all notebooks
config.json	Configuration necessary to connect to the Azure AI Studio Hub (same as Azure ML Workspace)

Parameterized execution

In addition to executing notebooks interactively, the notebooks also support parameterized command line execution using papermill.

Parameter files

The parameter files are contained in folder parameters and support the following configurations:

Parameter file	Model	Format
Llama-2-7b.yaml	Llama-2-7b	Completion
Meta-Llama-3-8B-Instruct.yaml	Meta-Llama-3-8B-Instruct	Chat
Meta-Llama-3.1-8B-Instruct.yaml	Meta-Llama-3.1-8B-Instruct	Chat

Running notebooks from the command line with a parameter file

Notebooks can be run all at once with a given parameter file using the following command:

./run_all.sh -p ./parameters/Meta-Llama-3.1-8B-Instruct.yaml

Taking down the infrastructure

After you are done working with the project, you can take down the infrastructure with the following command.

IMPORTANT: Please be aware that this will DELETE everything related to this project including generated datasets and fine-tuned models.

IMPORTANT: Save everything important to you before running this command.

azd down --purge

Note: The --purge parameter is important to reclaim quotas, for example for Azure OpenAI embedding models.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
doc		doc
infra		infra
parameters		parameters
sample_data		sample_data
.dockerignore		.dockerignore
.env.sample		.env.sample
.gitignore		.gitignore
1_gen.ipynb		1_gen.ipynb
2_finetune.ipynb		2_finetune.ipynb
3_deploy.ipynb		3_deploy.ipynb
4_eval.ipynb		4_eval.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
azure.yaml		azure.yaml
config.json.sample		config.json.sample
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_all.sh		run_all.sh
setup_raft.sh		setup_raft.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama 3.1 405B distillation using UC Berkeley's RAFT recipe on Azure AI Serverless

More about RAFT

Getting started / Provisioning Azure AI infrastructure

Bring you own models

Notebooks

Run time and costs

Dormant infrastructure costs

Configuration files

Parameterized execution

Parameter files

Running notebooks from the command line with a parameter file

Taking down the infrastructure

About

Releases

Packages

Contributors 2

Languages

License

Azure-Samples/raft-distillation-recipe

Folders and files

Latest commit

History

Repository files navigation

Llama 3.1 405B distillation using UC Berkeley's RAFT recipe on Azure AI Serverless

More about RAFT

Getting started / Provisioning Azure AI infrastructure

Bring you own models

Notebooks

Run time and costs

Dormant infrastructure costs

Configuration files

Parameterized execution

Parameter files

Running notebooks from the command line with a parameter file

Taking down the infrastructure

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages