Skip to content

This Graph RAG Application is a web-based tool that allows users to ask questions about the Mission Impossible film franchise and receive detailed, contextually relevant answers. By combining retrieval-based methods with generative AI, the application ensures that responses are both accurate and engaging.

License

Notifications You must be signed in to change notification settings

iamrahulreddy/cipher

Repository files navigation

Mission Cipher 🕵️‍♂️

Welcome! This project is a labor of love, inspired by my deep admiration for the Mission: Impossible film franchise. Whether you're a die-hard fan or just curious about the intricate world of espionage, this application is designed to provide you with insightful and engaging information about the characters and plots from the Mission: Impossible universe.

Live Demo

Check out the live demo of the Project Cipher at cipher.neuralnets.dev to experience it in action!

Note 📝

I have only used Mission: Impossible film data for embedding and retrieving content, so this terminal only contains data specifically from Mission: Impossible films. Inquiries outside this scope cannot be answered.

Overview

Mission Cipher is a Graph Retrieval-Augmented Generation (GraphRAG) application designed to navigate complex information within the Mission: Impossible universe. Unlike conventional RAG systems that rely solely on text similarity, GraphRAG builds a structured knowledge graph to enable multi-step reasoning and relationship-aware retrieval.

Core Technologies

  • Graph Processing (NetworkX): Entities and events are modeled as nodes, relationships as typed edges, allowing traversal and multi-hop lookups.
  • Embeddings & Vector Matching (NumPy, Scikit‑learn): Semantic vectors are generated for entities and enriched with local context to enable graph-aware retrieval.
  • Large Language Model (Google Gemini API): Used for entity recognition, relationship extraction, and natural language response generation.
  • Frontend Interface (React, Tailwind CSS): Offers a styled terminal-like chat interface with live streaming responses.
  • Backend Framework (Flask): Handles query intake, subgraph assembly, and API endpoints for both interactive and programmatic access.

Why GraphRAG?

GraphRAG excels at:

  • Disambiguating entities by leveraging their relational context.
  • Performing multi-hop reasoning (e.g., character ➝ event ➝ organization).
  • Supporting thematic queries that go beyond keyword matching.
  • Assembling structured context for more complete, accurate responses.

System Comparison

Feature Standard RAG GraphRAG
Entity/Relationship Awareness Relies purely on text similarity Embeds relationships in graph structure
Multi-Hop Reasoning Not supported Designed for recursive traversal and multi-step retrieval
Contextual Disambiguation Often ambiguous with similarly named entities Considers connected entities for clarity
Thematic Exploration Limited to surface content retrieval Traverses narrative relationships for deeper insight

Architecture Overview

Frontend Layer

Built with React and styled via Tailwind CSS:

  • Simulates a command-line chat terminal
  • Streams responses as they are generated
  • Sends user queries to backend and renders JSON results

Backend Layer

Implemented using Flask:

  • Receives /query, /graph-stats, /health endpoints
  • Converts queries into embeddings
  • Matches top entities via cosine similarity
  • Expands to context-aware subgraph via NetworkX
  • Compiles context and invokes Google Gemini for response generation

Data Layer

  • Knowledge graph built offline via build_graph.py
  • Embeddings stored and loaded during runtime
  • Graph entities and edges serialized for rapid access

Processing Pipeline

Offline Graph Construction

Executed via build_graph.py, step-by-step:

  1. Load documents from structured JSON files
  2. Extract entities using Gemini LLM
  3. Identify entity relationships (e.g., "betrayed", "member of", "led by")
  4. Create a MultiDiGraph of entities and their typed connections
  5. Generate entity embeddings augmented with local subgraph context
  6. Serialize graph structure and embeddings for runtime usage

Runtime Query Flow (app.py)

  1. Convert incoming user query into semantic embedding
  2. Retrieve top-k matching entities
  3. Expand to immediate relational neighborhood in the graph
  4. Build context from entity nodes and relationship edges
  5. Assemble and prime prompt for LLM response
  6. Return real-time, streamed responses to the frontend

API Specification

Method Endpoint Description Response Format
GET / Returns the chat application interface HTML
POST /query Accepts user prompt and returns generated answer JSON
GET /graph-stats Provides metadata such as node and edge counts JSON
GET /health Returns simple health check (status OK) JSON

Example usage:

curl -X POST https://cipher.neuralnets.dev/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain the betrayal of Jim Phelps"}'

Installation & Local Setup

Prerequisites

  • Python ≥ 3.8
  • Google Gemini API key
  • Node.js (for frontend)

Setup Steps

git clone https://github.com/iamrahulreddy/cipher.git
cd cipher
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env

Add your Gemini key to .env:

GEMINI_API_KEY=YOUR_API_KEY_HERE

Then run:

python build_graph.py    # Build graph and embeddings
python app.py            # Start API server

Visit: http://localhost:5000

Production Deployment (High-Level Guidance)

⚠️ These steps are generalized—adapt to your specific infrastructure and security requirements.

Requirements

  • Python ≥ 3.8
  • A production WSGI server such as Gunicorn
  • A reverse proxy server (e.g., Nginx, Apache)
  • SSL certificate (LetsEncrypt recommended)
  • At least 2 GB RAM dedicated for graph loading

Example Deployment Workflow

  1. Install dependencies and Gunicorn:
    pip install -r requirements.txt
  2. Configure your Nginx to proxy incoming traffic to Gunicorn:
    server {
      listen 80;
      server_name your_domain.com;
      location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
      }
    }
  3. (Optional) Set up HTTPS via Certbot.
  4. Create a systemd unit file to run Gunicorn as a service and enable it on startup.

Performance Evaluation

I have compared GraphRAG responses Standard RAG using real-world franchise queries. In 4 out of 5 cases, GraphRAG produced superior answers—especially on multi-hop, thematic queries. See the below files for more info

Technology Stack

Component Technology
Backend Framework Flask
Frontend React, Tailwind CSS
Graph Engine NetworkX
Semantic Embeddings NumPy, Scikit-learn
Language Model Google Gemini API
Data / Entity Tasks Pandas

Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See LICENSE.

References


Thank you for checking out the Mission Cipher I hope you enjoy exploring the world of Ethan Hunt and his team as much as I enjoyed creating this project. 🕵️‍♂️🎬

About

This Graph RAG Application is a web-based tool that allows users to ask questions about the Mission Impossible film franchise and receive detailed, contextually relevant answers. By combining retrieval-based methods with generative AI, the application ensures that responses are both accurate and engaging.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published