A 100% private, local Retrieval-Augmented Generation (RAG) stack using:
- EmbeddingGemma-300m for embeddings
- SQLite-vec for vector storage
- Qwen3:4b for language generation
- 100% Private & Offline Capable
Build a completely private, offline RAG application right on your laptop. This system combines Google's new EmbeddingGemma model for best-in-class local embeddings, SQLite-vec for a dead-simple vector database, and Ollama for a powerful, local LLM. No API keys, no costs, no data sent to the cloud.
- Python 3.9+
- Modern laptop with at least 8GB RAM
- Internet connection for initial model downloads
git clone <your-repo>
cd embeddinggemma
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or with pip
pip install uv
# Install all project dependencies
uv sync
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
ollama serve &
# Pull the Qwen3 model (2.5GB download)
ollama pull qwen3:4b
EmbeddingGemma requires Hugging Face access:
- Request access at: https://huggingface.co/google/embeddinggemma-300m
- Wait for approval (usually within 24 hours)
- Login via CLI:
# Login to Hugging Face
uv run huggingface-cli login
# Run the RAG system
uv run python rag_demo.py
To use this project with Jupyter notebooks in a standalone virtual environment:
# Add Jupyter packages to your project
uv add jupyter notebook ipykernel
# Register your virtual environment as a Jupyter kernel
uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"
# Start Jupyter
uv run jupyter notebook
# Or use Jupyter Lab
uv run jupyter lab
- Open your notebook
- Go to Kernel β Change kernel β EmbeddingGemma RAG
- Now all your project dependencies are available!
embeddinggemma/
βββ .venv/ # Virtual environment
βββ docs/ # Scraped documentation
βββ rag_demo.py # Main RAG demonstration script
βββ rag_demo.ipynb # Complete tutorial notebook
βββ pyproject.toml # Project dependencies (uv format)
βββ requirements.txt # Alternative pip format
βββ vectors_docs.db # SQLite vector database
Key parameters you can modify:
EMBEDDING_MODEL = "google/embeddinggemma-300m"
EMBEDDING_DIMS = 256 # 256 for 3x speed, 768 for max quality
LLM_MODEL = "qwen3:4b" # Try: qwen3:7b, llama3:8b, mistral:7b
DRY_RUN = False # Set True to test without LLM
uv run python rag_demo.py
from rag_docs import *
# Query the system
response = semantic_search_and_query("How do I use SQLite-vec with Python?")
Solution: Make sure you're using the correct kernel
- Register kernel:
uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"
- Switch kernel in Jupyter to "EmbeddingGemma RAG"
Solution: Install Jupyter in your environment
uv add jupyter notebook ipykernel
uv sync
Solution: Request access and wait for approval
- Visit: https://huggingface.co/google/embeddinggemma-300m
- Click "Request access to this repo"
- Wait 24 hours for approval
- Run
uv run huggingface-cli login
Solution: Ensure Ollama is running
# Check if running
ps aux | grep ollama
# Start if not running
ollama serve &
# Pull model if needed
ollama pull qwen3:4b
Solutions:
- Reduce
EMBEDDING_DIMS
to 256 - Use smaller batch sizes
- Try
qwen3:1.5b
instead ofqwen3:4b
- Close other applications
Check your setup:
# Verify environment is activated
which python # Should show .venv path
# Test imports
uv run python -c "import sqlite_vec, ollama, sentence_transformers; print('All imports working!')"
# Check Ollama
ollama list # Should show qwen3:4b
# Test Jupyter kernel
jupyter kernelspec list # Should show embeddinggemma kernel
- RAM: 8GB minimum, 16GB recommended
- Storage: ~3GB for models + data
- Models Downloaded:
- EmbeddingGemma-300m: ~600MB
- Qwen3:4b: ~2.5GB
Edit DOCUMENTATION_URLS
in the script to scrape your own docs.
- Embeddings: Try
google/embeddinggemma-768
for higher quality - LLM: Try
qwen3:7b
,llama3:8b
, ormistral:7b
Modify token-based chunking parameters:
max_tokens = 2048 # Chunk size
overlap_tokens = 100 # Overlap between chunks
β
100% Private: All processing happens locally
β
Zero Cost: No API fees after initial setup
β
Mobile-Optimized: EmbeddingGemma designed for mobile deployment
β
Fast: SQLite-vec provides sub-millisecond vector search
β
Smart: Qwen3 rivals much larger models with 256K context
β
Standalone: Complete isolation in virtual environment
This project is open source. See individual model licenses:
- EmbeddingGemma: Gemma License
- Qwen3: Apache 2.0
- SQLite-vec: Apache 2.0
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request