Skip to content

LLM-Implementation/private-rag-embeddinggemma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Private RAG System with EmbeddingGemma

A 100% private, local Retrieval-Augmented Generation (RAG) stack using:

  • EmbeddingGemma-300m for embeddings
  • SQLite-vec for vector storage
  • Qwen3:4b for language generation
  • 100% Private & Offline Capable

🎯 What This Project Does

Build a completely private, offline RAG application right on your laptop. This system combines Google's new EmbeddingGemma model for best-in-class local embeddings, SQLite-vec for a dead-simple vector database, and Ollama for a powerful, local LLM. No API keys, no costs, no data sent to the cloud.

πŸ“‹ Prerequisites

  • Python 3.9+
  • Modern laptop with at least 8GB RAM
  • Internet connection for initial model downloads

πŸš€ Quick Start

1. Clone and Setup

git clone <your-repo>
cd embeddinggemma

2. Install UV (if not already installed)

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with pip
pip install uv

3. Install Dependencies

# Install all project dependencies
uv sync

4. Setup Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve &

# Pull the Qwen3 model (2.5GB download)
ollama pull qwen3:4b

5. Hugging Face Authentication

EmbeddingGemma requires Hugging Face access:

  1. Request access at: https://huggingface.co/google/embeddinggemma-300m
  2. Wait for approval (usually within 24 hours)
  3. Login via CLI:
# Login to Hugging Face
uv run huggingface-cli login

6. Run the Demo

# Run the RAG system
uv run python rag_demo.py

πŸ“” Jupyter Notebook Setup

To use this project with Jupyter notebooks in a standalone virtual environment:

Step 1: Add Jupyter Dependencies

# Add Jupyter packages to your project
uv add jupyter notebook ipykernel

Step 2: Register Jupyter Kernel

# Register your virtual environment as a Jupyter kernel
uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"

Step 3: Launch Jupyter

# Start Jupyter
uv run jupyter notebook

# Or use Jupyter Lab
uv run jupyter lab

Step 4: Use the Correct Kernel

  1. Open your notebook
  2. Go to Kernel β†’ Change kernel β†’ EmbeddingGemma RAG
  3. Now all your project dependencies are available!

πŸ—οΈ Project Structure

embeddinggemma/
β”œβ”€β”€ .venv/                  # Virtual environment
β”œβ”€β”€ docs/                   # Scraped documentation
β”œβ”€β”€ rag_demo.py            # Main RAG demonstration script
β”œβ”€β”€ rag_demo.ipynb         # Complete tutorial notebook  
β”œβ”€β”€ pyproject.toml         # Project dependencies (uv format)
β”œβ”€β”€ requirements.txt       # Alternative pip format
└── vectors_docs.db        # SQLite vector database

πŸ”§ Configuration

Key parameters you can modify:

EMBEDDING_MODEL = "google/embeddinggemma-300m"
EMBEDDING_DIMS = 256  # 256 for 3x speed, 768 for max quality
LLM_MODEL = "qwen3:4b"  # Try: qwen3:7b, llama3:8b, mistral:7b
DRY_RUN = False  # Set True to test without LLM

πŸ§ͺ Usage Examples

Command Line

uv run python rag_demo.py

In Python/Jupyter

from rag_docs import *

# Query the system
response = semantic_search_and_query("How do I use SQLite-vec with Python?")

πŸ” Troubleshooting

Common Issues

"pip not found" in Jupyter

Solution: Make sure you're using the correct kernel

  1. Register kernel: uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"
  2. Switch kernel in Jupyter to "EmbeddingGemma RAG"

"Command not found: jupyter"

Solution: Install Jupyter in your environment

uv add jupyter notebook ipykernel
uv sync

EmbeddingGemma Access Denied

Solution: Request access and wait for approval

  1. Visit: https://huggingface.co/google/embeddinggemma-300m
  2. Click "Request access to this repo"
  3. Wait 24 hours for approval
  4. Run uv run huggingface-cli login

Ollama Connection Error

Solution: Ensure Ollama is running

# Check if running
ps aux | grep ollama

# Start if not running
ollama serve &

# Pull model if needed
ollama pull qwen3:4b

Out of Memory Errors

Solutions:

  • Reduce EMBEDDING_DIMS to 256
  • Use smaller batch sizes
  • Try qwen3:1.5b instead of qwen3:4b
  • Close other applications

Verification Commands

Check your setup:

# Verify environment is activated
which python  # Should show .venv path

# Test imports
uv run python -c "import sqlite_vec, ollama, sentence_transformers; print('All imports working!')"

# Check Ollama
ollama list  # Should show qwen3:4b

# Test Jupyter kernel
jupyter kernelspec list  # Should show embeddinggemma kernel

πŸ“Š System Requirements

  • RAM: 8GB minimum, 16GB recommended
  • Storage: ~3GB for models + data
  • Models Downloaded:
    • EmbeddingGemma-300m: ~600MB
    • Qwen3:4b: ~2.5GB

πŸ› οΈ Advanced Customization

Add Custom Documentation

Edit DOCUMENTATION_URLS in the script to scrape your own docs.

Different Models

  • Embeddings: Try google/embeddinggemma-768 for higher quality
  • LLM: Try qwen3:7b, llama3:8b, or mistral:7b

Chunking Strategy

Modify token-based chunking parameters:

max_tokens = 2048      # Chunk size
overlap_tokens = 100   # Overlap between chunks

🎯 Benefits

βœ… 100% Private: All processing happens locally
βœ… Zero Cost: No API fees after initial setup
βœ… Mobile-Optimized: EmbeddingGemma designed for mobile deployment
βœ… Fast: SQLite-vec provides sub-millisecond vector search
βœ… Smart: Qwen3 rivals much larger models with 256K context
βœ… Standalone: Complete isolation in virtual environment

πŸ“œ License

This project is open source. See individual model licenses:

  • EmbeddingGemma: Gemma License
  • Qwen3: Apache 2.0
  • SQLite-vec: Apache 2.0

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ”— Links

About

πŸ”’ 100% Private RAG Stack with EmbeddingGemma, SQLite-vec & Ollama - Zero Cost, Offline Capable

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published