If an enterprise wants to have its own industry vertical model, it needs to start with data, fine-tuning, and deployment. In the previous content, we introduced the content of Microsoft Olive, and we now complete a more detailed introduction based on the work of E2E.
We can refer to the projects generated by AI Toolkit for VS Code to structure our projects, including data, models, fine-tuned formats, and inferences.such as
|-- Your Phi-3-mini E2E Proj
|-- datasets
|-- fine-tuning
|-- inferences
|-- model-cache
|-- gen-model
|-- setup
-
datasets
Data can be stored in csv, json and other data. In this example, it is the exported json data. dataset
Note We can ignore the relevant settings here because the data has already been uploaded to Azure ML (if it is local we can upload the data here)
-
fine-tuning
Specify fine-tuning QLoRA and LoRA algorithms, and related parameters
-
inferences
Inference is the model after fine-tuning. It can be a reference to the fine-tuned Adapter layer, a reference to the model integrated with the Adapter after fine-tuning, or it can be a quantized ONNX Runtime model.
-
model-cache
Models downloaded via Hugging face CLI, here is the Phi-3-Mini model (Using Azure ML we can ignore this content, if you want to operate locally please execute the following script to obtain the phi-3 model)
huggingface-cli login
# input your key from Hugging Face Portal
huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir Your Phi-3-mini location
- gen-model
The model saved after the operation includes the fine-tuned Adapter model, the integrated fine-tuned Adapter model, and the quantitative model run by ONNX Runtime.
- setup
Required installation environment, please run this to setting your Olive Env
pip install -r requirements.txt
If you want to know about the configuration of Microsoft Olive, please visit Fine Tuning with Microsoft Olive
Note To stay up to date, install Microsoft Olive using
pip install git+https://github.com/microsoft/Olive
LoRA
This sample is use cloud compute,cloud datasets , add olive.config in fine-tuning folder
{
"azureml_client": {
"subscription_id": "Your Azure Subscription ID",
"resource_group": "Your Azure Resource Group",
"workspace_name": "Your Azure ML Worksapce",
"keyvault_name": "Your Azure Key Valuts"
},
"input_model":{
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "microsoft/Phi-3-mini-4k-instruct",
"task": "text-generation",
"from_pretrained_args": {
"trust_remote_code": true
}
}
}
},
"systems": {
"aml": {
"type": "AzureML",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"CUDAExecutionProvider"
]
}
],
"hf_token": true,
"aml_compute": "Your Azure ML Compute Cluster",
"aml_docker_config": {
"base_image": "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04",
"conda_file_path": "conda.yaml"
}
}
},
"azure_arc": {
"type": "AzureML",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"CUDAExecutionProvider"
]
}
],
"aml_compute": "Your Azure ML Compute",
"aml_docker_config": {
"base_image": "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04",
"conda_file_path": "conda.yaml"
}
}
}
},
"data_configs": [
{
"name": "dataset_default_train",
"type": "HuggingfaceContainer",
"load_dataset_config": {
"params": {
"data_name": "json",
"data_files": {
"type": "azureml_datastore",
"config": {
"azureml_client": {
"subscription_id": "Your Azure Subscrition ID",
"resource_group": "Your Azure Resource Group",
"workspace_name": "Your Azure ML Workspaces name"
},
"datastore_name": "workspaceblobstore",
"relative_path": "Your train_data.json Azure ML Location"
}
},
"split": "train"
}
},
"pre_process_data_config": {
"params": {
"dataset_type": "corpus",
"text_cols": [
"Question",
"Best Answer"
],
"text_template": "<|user|>\n{Question}<|end|>\n<|assistant|>\n{Best Answer}\n<|end|>",
"corpus_strategy": "join",
"source_max_len": 2048,
"pad_to_max_len": false,
"use_attention_mask": false
}
}
}
],
"passes": {
"lora": {
"type": "LoRA",
"config": {
"target_modules": [
"o_proj",
"qkv_proj"
],
"double_quant": true,
"lora_r": 64,
"lora_alpha": 64,
"lora_dropout": 0.1,
"train_data_config": "dataset_default_train",
"eval_dataset_size": 0.1,
"training_args": {
"seed": 0,
"data_seed": 42,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"gradient_accumulation_steps": 4,
"gradient_checkpointing": false,
"learning_rate": 0.0001,
"num_train_epochs": 1000,
"max_steps": 100,
"logging_steps": 100,
"evaluation_strategy": "steps",
"eval_steps": 187,
"group_by_length": true,
"adam_beta2": 0.999,
"max_grad_norm": 0.3
}
}
},
"merge_adapter_weights": {
"type": "MergeAdapterWeights"
},
"builder": {
"type": "ModelBuilder",
"config": {
"precision": "int4"
}
}
},
"engine": {
"log_severity_level": 0,
"host": "aml",
"target": "aml",
"search_strategy": false,
"cache_dir": "cache",
"output_dir" : "../model-cache/models/phi3-finetuned"
}
}
QLoRA
{
"azureml_client": {
"subscription_id": "Your Azure Subscription ID",
"resource_group": "Your Azure Resource Group",
"workspace_name": "Your Azure ML Worksapce",
"keyvault_name": "Your Azure Key Valuts"
},
"input_model":{
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "microsoft/Phi-3-mini-4k-instruct",
"task": "text-generation",
"from_pretrained_args": {
"trust_remote_code": true
}
}
}
},
"systems": {
"aml": {
"type": "AzureML",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"CUDAExecutionProvider"
]
}
],
"hf_token": true,
"aml_compute": "Your Azure ML Compute Cluster",
"aml_docker_config": {
"base_image": "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04",
"conda_file_path": "conda.yaml"
}
}
},
"azure_arc": {
"type": "AzureML",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"CUDAExecutionProvider"
]
}
],
"aml_compute": "Your Azure ML Compute",
"aml_docker_config": {
"base_image": "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04",
"conda_file_path": "conda.yaml"
}
}
}
},
"data_configs": [
{
"name": "dataset_default_train",
"type": "HuggingfaceContainer",
"load_dataset_config": {
"params": {
"data_name": "json",
"data_files": {
"type": "azureml_datastore",
"config": {
"azureml_client": {
"subscription_id": "Your Azure Subscrition ID",
"resource_group": "Your Azure Resource Group",
"workspace_name": "Your Azure ML Workspaces name"
},
"datastore_name": "workspaceblobstore",
"relative_path": "Your train_data.json Azure ML Location"
}
},
"split": "train"
}
},
"pre_process_data_config": {
"params": {
"dataset_type": "corpus",
"text_cols": [
"Question",
"Best Answer"
],
"text_template": "<|user|>\n{Question}<|end|>\n<|assistant|>\n{Best Answer}\n<|end|>",
"corpus_strategy": "join",
"source_max_len": 2048,
"pad_to_max_len": false,
"use_attention_mask": false
}
}
}
],
"passes": {
"qlora": {
"type": "QLoRA",
"config": {
"compute_dtype": "bfloat16",
"quant_type": "nf4",
"double_quant": true,
"lora_r": 64,
"lora_alpha": 64,
"lora_dropout": 0.1,
"train_data_config": "dataset_default_train",
"eval_dataset_size": 0.3,
"training_args": {
"seed": 0,
"data_seed": 42,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"gradient_accumulation_steps": 4,
"gradient_checkpointing": false,
"learning_rate": 0.0001,
"num_train_epochs": 3,
"max_steps": 10,
"logging_steps": 10,
"evaluation_strategy": "steps",
"eval_steps": 187,
"group_by_length": true,
"adam_beta2": 0.999,
"max_grad_norm": 0.3
}
}
},
"merge_adapter_weights": {
"type": "MergeAdapterWeights"
}
},
"engine": {
"log_severity_level": 0,
"host": "aml",
"target": "aml",
"search_strategy": false,
"cache_dir": "cache",
"output_dir" : "../model-cache/models/phi3-finetuned"
}
}
Notice
-
If you use QLoRA, the quantization conversion of ONNXRuntime-genai is not supported for the time being.
-
It should be pointed out here that you can set the above steps according to your own needs. It is not necessary to completely configure the above these steps. Depending on your needs, you can directly use the steps of the algorithm without fine-tuning. Finally you need to configure the relevant engines
After you finished Microsoft Olive , you need to run this command in terminal
olive run --config olive-config.json
Notice
- When Microsoft Olive is executed, each step can be placed in the cache. We can view the results of the relevant steps by viewing the fine-tuning directory.
-
We provide both LoRA and QLoRA here, and you can set them according to your needs.
-
The recommended running environment is WSL / Ubuntu 22.04+.
-
Why choose ORT? Because ORT can be deployed on edge devices, Inference is implemented in the ORT environment.