A platform that simplifies fine-tuning and inference for 60+ open-source LLMs through a single API interface. FlexAI enables serverless deployment, reducing setup time by up to 70%. Finally , You dont have to handle installations, OOMs, GPUs setup, prompt templates, integrating new models, wait too long to download huge models, etc.
- Serverless fine-tuning and inference
- Live time and cost estimations
- Checkpoint management
- LoRA and multi-LoRA support
- Target inference validations
- OpenAI-compatible Endpoints API
- Interactive Playground
- Sign up at app.getflex.ai, New accounts come with 5$ for free,to get started :)
- Get your API key from Settings -> API Keys
- Start with our documentation
- Everything can be done without any code from our dashboard - FlexAI Dashboard
One Notebook to fine tune all LLMs
You dont need to install, no CUDA, no NVIDIA drivers, no setup. Our lightweight library is only an API wrapper to FlexAI serverless GPUs. You can work from any operating system, including Windows, MacOS, and Linux.
pip install flex_ai openai
from flex_ai import FlexAI
# Initialize client with your API key
client = FlexAI(api_key="your-api-key")
# Create dataset - for all datasets [here](https://docs.getflex.ai/quickstart#upload-your-first-dataset)
dataset = client.create_dataset("Dataset Name", "train.jsonl", "eval.jsonl")
# Start fine-tuning -
task = client.create_finetune(
name="My Task",
dataset_id=dataset["id"],
# You can choose from 60+ models, Full list [here](https://docs.getflex.ai/core-concepts/models)
model="meta-llama/Llama-3.2-3B-Instruct",
n_epochs=10,
train_with_lora=True,
lora_config={
"lora_r": 64,
"lora_alpha": 8,
"lora_dropout": 0.1
}
)
# Create endpoint
endpoint = client.create_multi_lora_endpoint(
name="My Endpoint",
lora_checkpoints=[{"id": checkpoint_id, "name": "step_1"}],
compute="A100-40GB"
)
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url=f"{endpoint_url}/v1"
)
completion = client.completions.create(
model="your-model",
prompt="Your prompt",
max_tokens=60
)
Type | Links |
---|---|
π Documentation & Wiki | Read Our Docs |
Β Twitter (aka X) | Follow us on X |
πΎ Installation | getflex/README.md |
π Supported Models | FlexAI Models |
from flex_ai import FlexAI
from openai import OpenAI
import time
# Initialize the Flex AI client
client = FlexAI(api_key="your_api_key_here")
# Create dataset - for all datasets [here](https://docs.getflex.ai/quickstart#upload-your-first-dataset)
dataset = client.create_dataset(
"API Dataset New",
"instruction/train.jsonl",
"instruction/eval.jsonl"
)
# Start a fine-tuning task
task = client.create_finetune(
name="My Task New",
dataset_id=dataset["id"],
model="meta-llama/Llama-3.2-1B-Instruct",
n_epochs=5,
train_with_lora=True,
lora_config={
"lora_r": 64,
"lora_alpha": 8,
"lora_dropout": 0.1
},
n_checkpoints_and_evaluations_per_epoch=1,
batch_size=4,
learning_rate=0.0001,
save_only_best_checkpoint=True
)
# Wait for training completion
client.wait_for_task_completion(task_id=task["id"])
# Wait for last checkpoint to be uploaded
while True:
checkpoints = client.get_task_checkpoints(task_id=task["id"])
if checkpoints and checkpoints[-1]["stage"] == "FINISHED":
last_checkpoint = checkpoints[-1]
checkpoint_list = [{
"id": last_checkpoint["id"],
"name": "step_" + str(last_checkpoint["step"])
}]
break
time.sleep(10) # Wait 10 seconds before checking again
# Create endpoint
endpoint_id = client.create_multi_lora_endpoint(
name="My Endpoint New",
lora_checkpoints=checkpoints_list,
compute="A100-40GB"
)
endpoint = client.wait_for_endpoint_ready(endpoint_id=endpoint_id)
# Use the model
openai_client = OpenAI(
api_key="your_api_key_here",
base_url=f"{endpoint['url']}/v1"
)
completion = openai_client.completions.create(
model="meta-llama/Llama-3.2-1B-Instruct",
prompt="Translate the following English text to French",
max_tokens=60
)
print(completion.choices[0].text)
This table provides an overview of the Large Language Models (LLMs) available for fine-tuning, ordered approximately from most well-known to least familiar. It lists key details for each model, including its name, family, parameter count, context length, and additional features.
Model Name | Family | Parameters (B) | Context Length | vLLM Support | LoRA Support |
---|---|---|---|---|---|
Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.2-3B-Instruct | llama3.2 | 3 | 131,072 | Yes | Yes |
Meta-Llama-3.2-1B-Instruct | llama3.2 | 1 | 131,072 | Yes | Yes |
Mistral-Small-Instruct-2409 | mistral | 7.2 | 128,000 | Yes | Yes |
Ministral-8B-Instruct-2410 | mistral | 8 | 128,000 | Yes | Yes |
Mathstral-7B-v0.1 | mistral | 7 | 32,000 | Yes | Yes |
Qwen2.5-Coder-7B-Instruct | qwen2.5 | 7 | 32,768 | Yes | Yes |
Aya-Expanse-32b | aya | 32 | 128,000 | Yes | No |
Aya-Expanse-8b | aya | 8 | 8,000 | Yes | No |
Nemotron-Mini-4B-Instruct | nemotron | 4 | 4,096 | Yes | No |
Gemma-2-2b-it | gemma2 | 2 | 8,192 | Yes | Yes |
Meta-Llama-3.1-70B-Instruct | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-70B-Instruct | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-70B | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-8B-Instruct | llama3.1 | 8 | 131,072 | Yes | Yes |
Meta-Llama-3.1-8B | llama3.1 | 8 | 131,072 | Yes | Yes |
Meta-Llama-3-70B-Instruct | llama3 | 70 | 8,192 | Yes | Yes |
Meta-Llama-3-70B | llama3 | 70 | 8,192 | Yes | Yes |
Meta-Llama-3-8B-Instruct | llama3 | 8 | 8,192 | Yes | Yes |
Meta-Llama-3-8B | llama3 | 8 | 8,192 | Yes | Yes |
Mixtral-8x7B-Instruct-v0.1 | mixtral | 46.7 | 32,768 | Yes | Yes |
Mistral-7B-Instruct-v0.3 | mistral | 7.2 | 32,768 | Yes | Yes |
Mistral-Nemo-Instruct-2407 | mistral | 12.2 | 128,000 | No | No |
Mistral-Nemo-Base-2407 | mistral | 12.2 | 128,000 | No | No |
Gemma-2-27b-it | gemma2 | 27 | 8,192 | Yes | Yes |
Gemma-2-27b | gemma2 | 27 | 8,192 | Yes | Yes |
Gemma-2-9b-it | gemma2 | 9 | 8,192 | Yes | Yes |
Gemma-2-9b | gemma2 | 9 | 8,192 | Yes | Yes |
Phi-3-medium-128k-instruct | phi3 | 14 | 128,000 | Yes | No |
Phi-3-medium-4k-instruct | phi3 | 14 | 4,000 | Yes | No |
Phi-3-small-128k-instruct | phi3 | 7.4 | 128,000 | Yes | No |
Phi-3-small-8k-instruct | phi3 | 7.4 | 8,000 | Yes | No |
Phi-3-mini-128k-instruct | phi3 | 3.8 | 128,000 | Yes | No |
Phi-3-mini-4k-instruct | phi3 | 3.8 | 4,096 | Yes | No |
Qwen2-72B-Instruct | qwen2 | 72 | 32,768 | Yes | Yes |
Qwen2-72B | qwen2 | 72 | 32,768 | Yes | Yes |
Qwen2-57B-A14B-Instruct | qwen2 | 57 | 32,768 | Yes | Yes |
Qwen2-57B-A14B | qwen2 | 57 | 32,768 | Yes | Yes |
Qwen2-7B-Instruct | qwen2 | 7 | 32,768 | Yes | Yes |
Qwen2-7B | qwen2 | 7 | 32,768 | Yes | Yes |
Qwen2-1.5B-Instruct | qwen2 | 1.5 | 32,768 | Yes | Yes |
Qwen2-1.5B | qwen2 | 1.5 | 32,768 | Yes | Yes |
Qwen2-0.5B-Instruct | qwen2 | 0.5 | 32,768 | Yes | Yes |
Qwen2-0.5B | qwen2 | 0.5 | 32,768 | Yes | Yes |
TinyLlama_v1.1 | tinyllama | 1.1 | 2,048 | No | No |
DeepSeek-Coder-V2-Lite-Base | deepseek-coder-v2 | 16 | 163,840 | No | No |
InternLM2_5-7B-Chat | internlm2.5 | 7.74 | 1,000,000 | Yes | No |
InternLM2_5-7B | internlm2.5 | 7.74 | 1,000,000 | Yes | No |
Jamba-v0.1 | jamba | 51.6 | 256,000 | Yes | Yes |
Yi-1.5-34B-Chat | yi-1.5 | 34.4 | 4,000 | Yes | Yes |
Yi-1.5-34B | yi-1.5 | 34.4 | 4,000 | Yes | Yes |
Yi-1.5-34B-32K | yi-1.5 | 34.4 | 32,000 | Yes | Yes |
Yi-1.5-34B-Chat-16K | yi-1.5 | 34.4 | 16,000 | Yes | Yes |
Yi-1.5-9B-Chat | yi-1.5 | 8.83 | 4,000 | Yes | Yes |
Yi-1.5-9B | yi-1.5 | 8.83 | 4,000 | Yes | Yes |
Yi-1.5-9B-32K | yi-1.5 | 8.83 | 32,000 | Yes | Yes |
Yi-1.5-9B-Chat-16K | yi-1.5 | 8.83 | 16,000 | Yes | Yes |
Yi-1.5-6B-Chat | yi-1.5 | 6 | 4,000 | Yes | Yes |
Yi-1.5-6B | yi-1.5 | 6 | 4,000 | Yes | Yes |
c4ai-command-r-v01 | command-r | 35 | 131,072 | Yes | No |
- "vLLM Support" indicates whether the model is compatible with the vLLM (very Large Language Model) inference framework.
- "LoRA Support" indicates if the vLLM support inference the model with multiple LorA Adapters. Read more
- Context length is measured in tokens. (The model context can change by the target inference library)
- Parameter count is shown in billions (B).
- Links lead to the model's page on Hugging Face or the official website when available.
This table provides a comprehensive overview of the available models, their sizes, capabilities, and support for various fine-tuning techniques. When choosing a model for fine-tuning, consider factors such as the model size, context length, and support for specific optimization techniques like vLLM and LoRA.