Skip to content

galezra/ragbox

Repository files navigation

RAG Box - Local Enterprise AI Agent

GitHub license Python 3.12+

πŸ”’ Fully Offline | 🏒 Enterprise Ready | 🧠 Local AI | πŸ“š Document Q&A

A secure, offline AI assistant that enables enterprises to query their internal documents using natural language. Built with clean architecture principles, RAG Box processes documents locally and provides intelligent answers without sending data to external services.

✨ Features

  • πŸ”’ 100% Offline - No internet connection required after setup
  • πŸ“„ Multi-format Support - PDF, DOCX, XLSX, CSV, TXT files
  • 🧠 Local LLM Integration - Powered by Ollama for secure inference
  • πŸ” Advanced RAG - Retrieval-Augmented Generation with vector search
  • 🎯 RAG Fusion - Multiple query variations for better accuracy
  • πŸ”„ Auto-reindexing - File watcher detects changes automatically
  • 🎨 Multiple Interfaces - CLI, Streamlit UI, file watcher
  • πŸ—οΈ Clean Architecture - Modular, testable, maintainable code

πŸš€ Quick Start

Prerequisites

  • Python 3.12+
  • Poetry for dependency management
  • Ollama for local LLM inference
  • Pre-commit (optional, for development)

Installation

  1. Clone the repository:

    git clone https://github.com/galezra/ragbox.git
    cd ragbox
  2. Install dependencies:

    poetry install
  3. Set up Ollama (if not already installed):

    # Install Ollama (macOS/Linux)
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Pull a model (example: llama3)
    ollama pull llama3
  4. Configure the system:

    # Copy and edit configuration
    cp config/default.yaml config/local.yaml
    # Edit config/local.yaml with your preferred settings

Usage

πŸ“ Add Documents

Place your documents in the data/ directory:

mkdir -p data
cp /path/to/your/documents/* data/

πŸ–₯️ Command Line Interface

# Start CLI chat
poetry run python -m src.interfaces.cli.run

# Or use the entry point
poetry run ragbox

🌐 Web Interface

# Launch Streamlit UI
poetry run python -m src.interfaces.streamlit_ui.run_streamlit

πŸ‘οΈ File Watcher (Auto-reindexing)

# Start file watcher in background
poetry run python -m src.interfaces.watcher.run_watcher start

πŸ—οΈ Architecture

RAG Box follows clean architecture principles with clear separation of concerns:

src/
β”œβ”€β”€ domain/              # Core business logic
β”œβ”€β”€ application/         # Use cases (ingest, answer, reindex)
β”œβ”€β”€ infrastructure/      # External systems (LLM, vector store, loaders)
└── interfaces/          # Delivery mechanisms (CLI, UI, watcher)

Core Components

  • Document Processing - Multi-format loaders with metadata extraction
  • Embedding System - Local sentence transformers for vector representations
  • Vector Store - FAISS-based similarity search with persistence
  • LLM Integration - Ollama client with streaming and error handling
  • RAG Pipeline - Advanced retrieval with fusion and re-ranking

πŸ“– Documentation

πŸ”§ Configuration

The system uses YAML configuration files in the config/ directory:

# Example configuration
data_dir: "./data"
vector_store_dir: "./vector_store"
chunk_size: 1000
chunk_overlap: 200

# Embedding settings
embedding_model: "BAAI/bge-small-en-v1.5"
embedding_device: "cpu"

# LLM settings
llm_model: "llama3"
llm_temperature: 0.1
llm_max_tokens: 2048

πŸ§ͺ Development

Setup Development Environment

# Install with dev dependencies
poetry install --with dev

# Install pre-commit hooks
pre-commit install

# Run tests
poetry run pytest

# Run linting
poetry run ruff check
poetry run mypy src/

Code Quality

This project uses:

  • Ruff - Fast Python linter and formatter
  • MyPy - Static type checking
  • Pytest - Testing framework
  • Pre-commit - Git hooks for quality checks

🐳 Docker

# Build image
docker build -t ragbox:latest .

# Run container
docker run -p 8501:8501 -v $(pwd)/data:/app/data ragbox:latest

πŸ”’ Security & Privacy

  • No External APIs - All processing happens locally
  • No Data Leakage - Documents never leave your infrastructure
  • Audit Trail - All interactions logged locally
  • Configurable Logging - Control what gets logged

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments


Made with ❀️ for enterprises who value data privacy and security.

About

πŸ”’ Local AI Agent - Offline RAG system for secure document Q&A with no external APIs

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published