Skip to content

HelpingAI/llm-trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Trainer

Python 3.8+ PyTorch License GitHub SafeTensors Version

A production-ready framework for training Large Language Models from scratch with modern PyTorch

What's New in v0.2.6

  • Memory Optimizations: Efficient training with kernel optimizations
  • SafeTensors Support: Secure model serialization with automatic sharding for large models
  • HuggingFace Integration: Use any pretrained tokenizer via HFTokenizerWrapper
  • Accelerate Support: Distributed training with use_accelerate=true
  • LoRA/PEFT: Parameter-efficient fine-tuning with use_peft=true
  • Backward Compatible: Existing PyTorch models continue to work
  • Patching System: Kernel optimizations and memory-efficient training

Features

Core Architecture

  • Custom Transformer Implementation: Multi-head attention, feed-forward networks, positional encodings
  • SafeTensors Integration: Secure model serialization with automatic sharding
  • Modular Design: Easy to extend and customize for research and production

Tokenization

  • BPE Tokenizer: From-scratch BPE with Unicode and emoji support
  • HuggingFace Integration: Use any pretrained tokenizer (Mistral, Llama, GPT-2, etc.)
  • WordPiece Support: Alternative tokenization strategies

Data Pipeline

  • HuggingFace Datasets: Efficient loading with preprocessing and batching
  • Memory Optimization: Smart sequence packing and data streaming
  • Multi-Processing: Parallel data preprocessing for faster training

Training & Inference

  • CPU/GPU Support: Optimized configurations for both CPU and GPU training
  • Distributed Training: Multi-GPU support via Accelerate and DeepSpeed
  • Parameter-Efficient: LoRA/PEFT adapters for memory-efficient fine-tuning
  • Mixed Precision: FP16/BF16 automatic mixed precision
  • Multiple Decoding Strategies: Greedy, beam search, nucleus (top-p), top-k sampling
  • Enhanced Trainer: TRL-style training methods with familiar APIs
  • Memory-Efficient Optimizers: Optimized implementations for better performance
  • Kernel Optimizations: Fused operations for better performance
  • Low VRAM Training: Gradient checkpointing and memory-efficient techniques

Monitoring & Evaluation

  • TensorBoard Integration: Real-time training metrics and visualizations
  • Weights & Biases: Experiment tracking and hyperparameter optimization
  • Comprehensive Metrics: Perplexity, cross-entropy loss, generation quality

Requirements

  • Python 3.8 or higher
  • PyTorch 2.0 or higher
  • GPU: CUDA-compatible GPU (recommended) or CPU-only mode
  • Memory: 8GB RAM minimum (16GB+ recommended)

Installation

Basic Installation

git clone https://github.com/HelpingAI/llm-trainer.git
cd llm-trainer
pip install -e .

Optional Dependencies

# Development tools
pip install -e ".[dev]"

# SafeTensors support (recommended)
pip install -e ".[safetensors]"

# Distributed training
pip install -e ".[distributed]"

# All features
pip install -e ".[full]"

Quick Start

Python API - Enhanced Training

from llm_trainer import Trainer, TrainingConfig
from llm_trainer.models import TransformerLM
from llm_trainer.config import ModelConfig
from llm_trainer.tokenizer import BPETokenizer

# Create model and tokenizer
model_config = ModelConfig(
    vocab_size=32000,
    d_model=512,
    n_heads=8,
    n_layers=6,
    max_seq_len=1024
)
model = TransformerLM(model_config)
tokenizer = BPETokenizer()

# Configure training with TRL-style parameters
training_config = TrainingConfig(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    learning_rate=2e-5,
    num_train_epochs=3,
    logging_steps=10,
    save_steps=100,
    optim="adamw"  # TRL-style parameter
)

# Create trainer and train
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    config=training_config
)

# TRL-style training methods
trainer.train()  # Standard training
trainer.sft_train()  # Supervised fine-tuning
trainer.dpo_train()  # Direct preference optimization

HuggingFace Integration with PEFT

from llm_trainer import Trainer, TrainingConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, TaskType

# Load pretrained model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Configure LoRA (PEFT)
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM
)

# Create trainer with PEFT
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    config=TrainingConfig(),
    peft_config=lora_config  # Pass PEFT config directly
)

# Show parameter efficiency
trainer.print_trainable_parameters()

# Train with familiar API
trainer.train()

Memory-Efficient Optimizers

from llm_trainer.training import create_optimizer

# Create memory-efficient optimizer
optimizer = create_optimizer(
    model,
    optimizer_name="adamw",
    learning_rate=5e-5,
    weight_decay=0.01
)

Patching for Transformers/TRL

from llm_trainer import patch_transformers, patch_trl

# Patch Hugging Face Transformers with memory-efficient optimizations
patch_transformers()

# Patch TRL with memory-efficient optimizations
patch_trl()

# Now you can use enhanced Transformers/TRL classes with memory-efficient methods
from transformers import Trainer, TrainingArguments
from trl import SFTTrainer

# These trainers now have enhanced methods
trainer = SFTTrainer(...)
trainer.print_trainable_parameters()  # Added by patching
trainer.prepare_model_for_kbit_training()  # Added by patching

Kernel Optimizations for Fast Training

from llm_trainer.kernels import (
    FusedLinear, FusedRMSNorm, fused_cross_entropy,
    gradient_checkpointing, LowVRAMLinear, empty_cache
)

# Use fused operations for better performance
fused_linear = FusedLinear(in_features=512, out_features=512)
fused_norm = FusedRMSNorm(dim=512)

# Use gradient checkpointing to reduce memory usage
def forward_pass_with_checkpointing(model, inputs):
    return gradient_checkpointing(model, inputs)

# Use low VRAM linear layers for memory-efficient training
low_vram_linear = LowVRAMLinear(in_features=512, out_features=512)

# Clear cache to free up memory
empty_cache()

Command Line

# GPU Training
python scripts/train.py --config configs/small_model.yaml --output_dir ./output

# CPU Training (no GPU required)
python scripts/train.py --config configs/cpu_small_model.yaml --output_dir ./output

# Text Generation
python scripts/generate.py --model_path ./output --prompts "The quick brown fox" --interactive

# Model Evaluation
python scripts/evaluate.py --model_path ./output --dataset_config configs/eval_config.json

Configuration

The framework uses YAML/JSON configuration files for reproducible experiments:

Small Model (Quick Start)

model:
  d_model: 512
  n_heads: 8
  n_layers: 6
  vocab_size: 32000
  max_seq_len: 1024

training:
  batch_size: 16
  learning_rate: 1e-4
  num_epochs: 3
  use_amp: true
  gradient_accumulation_steps: 4

CPU-Optimized Training

device: "cpu"
model:
  d_model: 256
  n_heads: 4
  n_layers: 4
  max_seq_len: 512

training:
  batch_size: 2
  use_amp: false
  gradient_accumulation_steps: 8
  dataloader_num_workers: 2

Advanced Configuration

model:
  d_model: 768
  n_heads: 12
  n_layers: 12

training:
  use_accelerate: true
  accelerate_mixed_precision: "fp16"
  use_peft: true
  peft_type: "lora"
  peft_r: 8
  peft_alpha: 16

# SafeTensors settings
save_format: "safetensors"
max_shard_size: "2GB"

Project Structure

llm-trainer/
├── src/llm_trainer/              # Main package
│   ├── models/                   # Model architectures
│   │   ├── base_model.py         # Base model interface
│   │   ├── transformer.py        # Custom Transformer implementation
│   │   ├── safetensors_utils.py  # SafeTensors utilities
│   │   └── attention.py          # Attention mechanisms
│   ├── tokenizer/                # Tokenization
│   │   ├── bpe_tokenizer.py      # BPE implementation
│   │   ├── hf_tokenizer.py       # HuggingFace wrapper
│   │   └── wordpiece_tokenizer.py # WordPiece implementation
│   ├── data/                     # Data pipeline
│   │   ├── dataset.py            # Dataset classes
│   │   ├── dataloader.py         # Data loading
│   │   └── preprocessing.py      # Data preprocessing
│   ├── training/                 # Training infrastructure
│   │   ├── trainer.py            # Enhanced trainer with TRL-style APIs
│   │   ├── optimizer.py          # Standard optimizers
│   │   └── scheduler.py          # Learning rate schedulers
│   ├── kernels/                  # Kernel optimizations
│   │   ├── fused_ops.py          # Fused operations
│   │   └── memory_efficient.py   # Memory-efficient operations
│   ├── patching/                 # Patching system
│   │   ├── patch_transformers.py # Transformers patching
│   │   └── patch_trl.py          # TRL patching
│   ├── utils/                    # Utilities
│   │   ├── generation.py         # Text generation
│   │   ├── inference.py          # Inference utilities
│   │   └── metrics.py            # Evaluation metrics
│   └── config/                   # Configuration
│       ├── model_config.py       # Model configuration
│       └── training_config.py    # Training configuration
├── scripts/                      # CLI tools
│   ├── train.py                  # Training script
│   ├── generate.py               # Text generation
│   └── evaluate.py               # Model evaluation
├── configs/                      # Pre-configured setups
│   ├── small_model.yaml          # Small GPU model
│   ├── medium_model.yaml         # Medium GPU model
│   ├── cpu_small_model.yaml      # CPU-optimized small
│   └── cpu_medium_model.yaml     # CPU-optimized medium
├── examples/                     # Usage examples
│   ├── complete_pipeline.py      # End-to-end example
│   ├── safetensors_example.py    # SafeTensors demo
│   └── train_small_model.py      # Quick start example
└── docs/                         # Documentation

Documentation

Development

Running Tests

pip install -e ".[dev]"
pytest tests/

Code Quality

black src/ scripts/ examples/
flake8 src/ scripts/ examples/
mypy src/

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

About

A complete framework for training Large Language Models from scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages