FlexAI

Lightweight Library to Finetune and Deploy All LLMs, no CUDA, no NVIDIA drivers, no OOMs, Multi-GPUs setup, No Prompt Templates !

FlexAI

A platform that simplifies fine-tuning and inference for 60+ open-source LLMs through a single API interface. FlexAI enables serverless deployment, reducing setup time by up to 70%. Finally , You dont have to handle installations, OOMs, GPUs setup, prompt templates, integrating new models, wait too long to download huge models, etc.

⭐ Key Features

Serverless fine-tuning and inference
Live time and cost estimations
Checkpoint management
LoRA and multi-LoRA support
Target inference validations
OpenAI-compatible Endpoints API
Interactive Playground

✨ Get Started

Sign up at app.getflex.ai, New accounts come with 5$ for free,to get started :)
Get your API key from Settings -> API Keys
Start with our documentation
Everything can be done without any code from our dashboard - FlexAI Dashboard

📚 Full Google Colab Example

One Notebook to fine tune all LLMs

💾 Installation

You dont need to install, no CUDA, no NVIDIA drivers, no setup. Our lightweight library is only an API wrapper to FlexAI serverless GPUs. You can work from any operating system, including Windows, MacOS, and Linux.

pip install flex_ai openai

🦥 Quick Start

from flex_ai import FlexAI

# Initialize client with your API key
client = FlexAI(api_key="your-api-key")

# Create dataset - for all datasets [here](https://docs.getflex.ai/quickstart#upload-your-first-dataset)
dataset = client.create_dataset("Dataset Name", "train.jsonl", "eval.jsonl")

# Start fine-tuning -
task = client.create_finetune(
    name="My Task",
    dataset_id=dataset["id"],
    # You can choose from 60+ models, Full list [here](https://docs.getflex.ai/core-concepts/models)
    model="meta-llama/Llama-3.2-3B-Instruct",
    n_epochs=10,
    train_with_lora=True,
    lora_config={
        "lora_r": 64,
        "lora_alpha": 8,
        "lora_dropout": 0.1
    }
)

# Create endpoint
endpoint = client.create_multi_lora_endpoint(
    name="My Endpoint",
    lora_checkpoints=[{"id": checkpoint_id, "name": "step_1"}],
    compute="A100-40GB"
)

🥇 Using Your Fine-tuned Model

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url=f"{endpoint_url}/v1"
)

completion = client.completions.create(
    model="your-model",
    prompt="Your prompt",
    max_tokens=60
)

🔗 Links and Resources

Type	Links
📚 Documentation & Wiki	Read Our Docs
Twitter (aka X)	Follow us on X
💾 Installation	getflex/README.md
🌐 Supported Models	FlexAI Models

🦥 Full Example

from flex_ai import FlexAI
from openai import OpenAI
import time

# Initialize the Flex AI client
client = FlexAI(api_key="your_api_key_here")

# Create dataset - for all datasets [here](https://docs.getflex.ai/quickstart#upload-your-first-dataset)
dataset = client.create_dataset(
    "API Dataset New",
    "instruction/train.jsonl",
    "instruction/eval.jsonl"
)

# Start a fine-tuning task
task = client.create_finetune(
    name="My Task New",
    dataset_id=dataset["id"],
    model="meta-llama/Llama-3.2-1B-Instruct",
    n_epochs=5,
    train_with_lora=True,
    lora_config={
        "lora_r": 64,
        "lora_alpha": 8,
        "lora_dropout": 0.1
    },
    n_checkpoints_and_evaluations_per_epoch=1,
    batch_size=4,
    learning_rate=0.0001,
    save_only_best_checkpoint=True
)

# Wait for training completion
client.wait_for_task_completion(task_id=task["id"])

# Wait for last checkpoint to be uploaded
while True:
    checkpoints = client.get_task_checkpoints(task_id=task["id"])
    if checkpoints and checkpoints[-1]["stage"] == "FINISHED":
        last_checkpoint = checkpoints[-1]
        checkpoint_list = [{
            "id": last_checkpoint["id"],
            "name": "step_" + str(last_checkpoint["step"])
        }]
        break
    time.sleep(10)  # Wait 10 seconds before checking again

# Create endpoint
endpoint_id = client.create_multi_lora_endpoint(
    name="My Endpoint New",
    lora_checkpoints=checkpoints_list,
    compute="A100-40GB"
)
endpoint = client.wait_for_endpoint_ready(endpoint_id=endpoint_id)

# Use the model
openai_client = OpenAI(
    api_key="your_api_key_here",
    base_url=f"{endpoint['url']}/v1"
)
completion = openai_client.completions.create(
    model="meta-llama/Llama-3.2-1B-Instruct",
    prompt="Translate the following English text to French",
    max_tokens=60
)

print(completion.choices[0].text)

LLM Models Available for Fine-tuning

This table provides an overview of the Large Language Models (LLMs) available for fine-tuning, ordered approximately from most well-known to least familiar. It lists key details for each model, including its name, family, parameter count, context length, and additional features.

Model Name	Family	Parameters (B)	Context Length	vLLM Support	LoRA Support
Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF	llama3.1	70	131,072	Yes	Yes
Meta-Llama-3.2-3B-Instruct	llama3.2	3	131,072	Yes	Yes
Meta-Llama-3.2-1B-Instruct	llama3.2	1	131,072	Yes	Yes
Mistral-Small-Instruct-2409	mistral	7.2	128,000	Yes	Yes
Ministral-8B-Instruct-2410	mistral	8	128,000	Yes	Yes
Mathstral-7B-v0.1	mistral	7	32,000	Yes	Yes
Qwen2.5-Coder-7B-Instruct	qwen2.5	7	32,768	Yes	Yes
Aya-Expanse-32b	aya	32	128,000	Yes	No
Aya-Expanse-8b	aya	8	8,000	Yes	No
Nemotron-Mini-4B-Instruct	nemotron	4	4,096	Yes	No
Gemma-2-2b-it	gemma2	2	8,192	Yes	Yes
Meta-Llama-3.1-70B-Instruct	llama3.1	70	131,072	Yes	Yes
Meta-Llama-3.1-70B-Instruct	llama3.1	70	131,072	Yes	Yes
Meta-Llama-3.1-70B	llama3.1	70	131,072	Yes	Yes
Meta-Llama-3.1-8B-Instruct	llama3.1	8	131,072	Yes	Yes
Meta-Llama-3.1-8B	llama3.1	8	131,072	Yes	Yes
Meta-Llama-3-70B-Instruct	llama3	70	8,192	Yes	Yes
Meta-Llama-3-70B	llama3	70	8,192	Yes	Yes
Meta-Llama-3-8B-Instruct	llama3	8	8,192	Yes	Yes
Meta-Llama-3-8B	llama3	8	8,192	Yes	Yes
Mixtral-8x7B-Instruct-v0.1	mixtral	46.7	32,768	Yes	Yes
Mistral-7B-Instruct-v0.3	mistral	7.2	32,768	Yes	Yes
Mistral-Nemo-Instruct-2407	mistral	12.2	128,000	No	No
Mistral-Nemo-Base-2407	mistral	12.2	128,000	No	No
Gemma-2-27b-it	gemma2	27	8,192	Yes	Yes
Gemma-2-27b	gemma2	27	8,192	Yes	Yes
Gemma-2-9b-it	gemma2	9	8,192	Yes	Yes
Gemma-2-9b	gemma2	9	8,192	Yes	Yes
Phi-3-medium-128k-instruct	phi3	14	128,000	Yes	No
Phi-3-medium-4k-instruct	phi3	14	4,000	Yes	No
Phi-3-small-128k-instruct	phi3	7.4	128,000	Yes	No
Phi-3-small-8k-instruct	phi3	7.4	8,000	Yes	No
Phi-3-mini-128k-instruct	phi3	3.8	128,000	Yes	No
Phi-3-mini-4k-instruct	phi3	3.8	4,096	Yes	No
Qwen2-72B-Instruct	qwen2	72	32,768	Yes	Yes
Qwen2-72B	qwen2	72	32,768	Yes	Yes
Qwen2-57B-A14B-Instruct	qwen2	57	32,768	Yes	Yes
Qwen2-57B-A14B	qwen2	57	32,768	Yes	Yes
Qwen2-7B-Instruct	qwen2	7	32,768	Yes	Yes
Qwen2-7B	qwen2	7	32,768	Yes	Yes
Qwen2-1.5B-Instruct	qwen2	1.5	32,768	Yes	Yes
Qwen2-1.5B	qwen2	1.5	32,768	Yes	Yes
Qwen2-0.5B-Instruct	qwen2	0.5	32,768	Yes	Yes
Qwen2-0.5B	qwen2	0.5	32,768	Yes	Yes
TinyLlama_v1.1	tinyllama	1.1	2,048	No	No
DeepSeek-Coder-V2-Lite-Base	deepseek-coder-v2	16	163,840	No	No
InternLM2_5-7B-Chat	internlm2.5	7.74	1,000,000	Yes	No
InternLM2_5-7B	internlm2.5	7.74	1,000,000	Yes	No
Jamba-v0.1	jamba	51.6	256,000	Yes	Yes
Yi-1.5-34B-Chat	yi-1.5	34.4	4,000	Yes	Yes
Yi-1.5-34B	yi-1.5	34.4	4,000	Yes	Yes
Yi-1.5-34B-32K	yi-1.5	34.4	32,000	Yes	Yes
Yi-1.5-34B-Chat-16K	yi-1.5	34.4	16,000	Yes	Yes
Yi-1.5-9B-Chat	yi-1.5	8.83	4,000	Yes	Yes
Yi-1.5-9B	yi-1.5	8.83	4,000	Yes	Yes
Yi-1.5-9B-32K	yi-1.5	8.83	32,000	Yes	Yes
Yi-1.5-9B-Chat-16K	yi-1.5	8.83	16,000	Yes	Yes
Yi-1.5-6B-Chat	yi-1.5	6	4,000	Yes	Yes
Yi-1.5-6B	yi-1.5	6	4,000	Yes	Yes
c4ai-command-r-v01	command-r	35	131,072	Yes	No

Notes:

"vLLM Support" indicates whether the model is compatible with the vLLM (very Large Language Model) inference framework.
"LoRA Support" indicates if the vLLM support inference the model with multiple LorA Adapters. Read more
Context length is measured in tokens. (The model context can change by the target inference library)
Parameter count is shown in billions (B).
Links lead to the model's page on Hugging Face or the official website when available.

This table provides a comprehensive overview of the available models, their sizes, capabilities, and support for various fine-tuning techniques. When choosing a model for fine-tuning, consider factors such as the model size, context length, and support for specific optimization techniques like vLLM and LoRA.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
.vscode		.vscode
flex_ai		flex_ai
images		images
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lightweight Library to Finetune and Deploy All LLMs, no CUDA, no NVIDIA drivers, no OOMs, Multi-GPUs setup, No Prompt Templates !

FlexAI

⭐ Key Features

✨ Get Started

📚 Full Google Colab Example

💾 Installation

🦥 Quick Start

🥇 Using Your Fine-tuned Model

🔗 Links and Resources

🦥 Full Example

LLM Models Available for Fine-tuning

Notes:

About

Releases 3

Packages

Contributors 2

Languages

License

getflexai/flex_ai

Folders and files

Latest commit

History

Repository files navigation

Lightweight Library to Finetune and Deploy All LLMs, no CUDA, no NVIDIA drivers, no OOMs, Multi-GPUs setup, No Prompt Templates !

FlexAI

⭐ Key Features

✨ Get Started

📚 Full Google Colab Example

💾 Installation

🦥 Quick Start

🥇 Using Your Fine-tuned Model

🔗 Links and Resources

🦥 Full Example

LLM Models Available for Fine-tuning

Notes:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages