RAG Box - Local Enterprise AI Agent

🔒 Fully Offline | 🏢 Enterprise Ready | 🧠 Local AI | 📚 Document Q&A

A secure, offline AI assistant that enables enterprises to query their internal documents using natural language. Built with clean architecture principles, RAG Box processes documents locally and provides intelligent answers without sending data to external services.

✨ Features

🔒 100% Offline - No internet connection required after setup
📄 Multi-format Support - PDF, DOCX, XLSX, CSV, TXT files
🧠 Local LLM Integration - Powered by Ollama for secure inference
🔍 Advanced RAG - Retrieval-Augmented Generation with vector search
🎯 RAG Fusion - Multiple query variations for better accuracy
🔄 Auto-reindexing - File watcher detects changes automatically
🎨 Multiple Interfaces - CLI, Streamlit UI, file watcher
🏗️ Clean Architecture - Modular, testable, maintainable code

🚀 Quick Start

Prerequisites

Python 3.12+
Poetry for dependency management
Ollama for local LLM inference
Pre-commit (optional, for development)

Installation

Clone the repository:

git clone https://github.com/galezra/ragbox.git
cd ragbox

Install dependencies:
```
poetry install
```

Set up Ollama (if not already installed):

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model (example: llama3)
ollama pull llama3

Configure the system:

# Copy and edit configuration
cp config/default.yaml config/local.yaml
# Edit config/local.yaml with your preferred settings

Usage

📁 Add Documents

Place your documents in the data/ directory:

mkdir -p data
cp /path/to/your/documents/* data/

🖥️ Command Line Interface

# Start CLI chat
poetry run python -m src.interfaces.cli.run

# Or use the entry point
poetry run ragbox

🌐 Web Interface

# Launch Streamlit UI
poetry run python -m src.interfaces.streamlit_ui.run_streamlit

👁️ File Watcher (Auto-reindexing)

# Start file watcher in background
poetry run python -m src.interfaces.watcher.run_watcher start

🏗️ Architecture

RAG Box follows clean architecture principles with clear separation of concerns:

src/
├── domain/              # Core business logic
├── application/         # Use cases (ingest, answer, reindex)
├── infrastructure/      # External systems (LLM, vector store, loaders)
└── interfaces/          # Delivery mechanisms (CLI, UI, watcher)

Core Components

Document Processing - Multi-format loaders with metadata extraction
Embedding System - Local sentence transformers for vector representations
Vector Store - FAISS-based similarity search with persistence
LLM Integration - Ollama client with streaming and error handling
RAG Pipeline - Advanced retrieval with fusion and re-ranking

📖 Documentation

Architecture Overview - Detailed system design
Development Tasks - Implementation progress
CLI Interface - Command line usage
Streamlit UI - Web interface guide
File Watcher - Auto-reindexing setup

🔧 Configuration

The system uses YAML configuration files in the config/ directory:

# Example configuration
data_dir: "./data"
vector_store_dir: "./vector_store"
chunk_size: 1000
chunk_overlap: 200

# Embedding settings
embedding_model: "BAAI/bge-small-en-v1.5"
embedding_device: "cpu"

# LLM settings
llm_model: "llama3"
llm_temperature: 0.1
llm_max_tokens: 2048

🧪 Development

Setup Development Environment

# Install with dev dependencies
poetry install --with dev

# Install pre-commit hooks
pre-commit install

# Run tests
poetry run pytest

# Run linting
poetry run ruff check
poetry run mypy src/

Code Quality

This project uses:

Ruff - Fast Python linter and formatter
MyPy - Static type checking
Pytest - Testing framework
Pre-commit - Git hooks for quality checks

🐳 Docker

# Build image
docker build -t ragbox:latest .

# Run container
docker run -p 8501:8501 -v $(pwd)/data:/app/data ragbox:latest

🔒 Security & Privacy

No External APIs - All processing happens locally
No Data Leakage - Documents never leave your infrastructure
Audit Trail - All interactions logged locally
Configurable Logging - Control what gets logged

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Ollama for local LLM inference
Powered by FAISS for vector search
Uses Sentence Transformers for embeddings
UI built with Streamlit

Made with ❤️ for enterprises who value data privacy and security.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
config		config
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.renovaterc.json		.renovaterc.json
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
architecture.md		architecture.md
cli.py		cli.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tasks.md		tasks.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Box - Local Enterprise AI Agent

✨ Features

🚀 Quick Start

Prerequisites

Installation

Usage

📁 Add Documents

🖥️ Command Line Interface

🌐 Web Interface

👁️ File Watcher (Auto-reindexing)

🏗️ Architecture

Core Components

📖 Documentation

🔧 Configuration

🧪 Development

Setup Development Environment

Code Quality

🐳 Docker

🔒 Security & Privacy

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

galezra/ragbox

Folders and files

Latest commit

History

Repository files navigation

RAG Box - Local Enterprise AI Agent

✨ Features

🚀 Quick Start

Prerequisites

Installation

Usage

📁 Add Documents

🖥️ Command Line Interface

🌐 Web Interface

👁️ File Watcher (Auto-reindexing)

🏗️ Architecture

Core Components

📖 Documentation

🔧 Configuration

🧪 Development

Setup Development Environment

Code Quality

🐳 Docker

🔒 Security & Privacy

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages