Skip to content

πŸš€ Advanced Text-to-SQL RAG System with LangChain, LangGraph & React - Convert natural language to SQL with AI-powered intelligence

Notifications You must be signed in to change notification settings

Aaryan04/advanced-text-to-sql-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Advanced Text-to-SQL RAG System

A sophisticated Text-to-SQL system built with LangChain, LangGraph, and modern web technologies. This system converts natural language questions into SQL queries using Retrieval-Augmented Generation (RAG) with advanced features like self-correction, query optimization, and real-time execution.

image

✨ Features

πŸ” Advanced RAG System

  • Vector embeddings for schema understanding and query examples
  • Semantic search for relevant context retrieval
  • Example-based learning with curated query patterns
  • Schema-aware query generation

πŸš€ LangGraph Workflow

  • Multi-step workflow with validation and self-correction
  • Automatic retry logic for failed queries
  • Query optimization and performance tuning
  • Real-time progress tracking via WebSocket

πŸ›‘οΈ Security & Validation

  • SQL injection prevention with comprehensive validation
  • Query complexity analysis and safety checks
  • Execution sandboxing with result limits
  • Error handling and user feedback

🎨 Modern UI/UX

  • Dark theme with Material-UI components
  • Real-time query execution with progress indicators
  • Interactive data visualization with charts
  • Monaco Editor for SQL syntax highlighting
  • Responsive design for all devices

πŸ“Š Analytics & Monitoring

  • Query history tracking and analysis
  • Performance metrics and success rates
  • Error analysis and debugging tools
  • Database schema explorer

πŸ—οΈ Architecture

β”œβ”€β”€ backend/                 # FastAPI Backend
β”‚   β”œβ”€β”€ main.py             # Application entry point
β”‚   β”œβ”€β”€ database/           # Database management
β”‚   β”œβ”€β”€ rag/               # RAG system implementation
β”‚   β”œβ”€β”€ graph/             # LangGraph workflow
β”‚   └── utils/             # Utilities and validators
β”œβ”€β”€ frontend/              # React Frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/    # Reusable components
β”‚   β”‚   β”œβ”€β”€ pages/         # Page components
β”‚   β”‚   β”œβ”€β”€ hooks/         # Custom hooks
β”‚   β”‚   └── utils/         # API and utilities
└── requirements.txt       # Python dependencies

πŸš€ Quick Start

Prerequisites

  • Python 3.9+
  • Node.js 16+
  • OpenAI API key
  • PostgreSQL (optional, SQLite by default)

Backend Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Set environment variables:
cp .env.example .env
# Edit .env with your OpenAI API key and database settings
  1. Run the backend:
cd backend
python main.py

The backend will start at http://localhost:8001

Frontend Setup

  1. Install dependencies:
cd frontend
npm install
  1. Start the development server:
npm start

The frontend will start at http://localhost:3000

🎯 Usage Examples

Basic Queries

  • "Show all employees in the engineering department"
  • "What is the average salary by department?"
  • "List all active projects"

Complex Analytics

  • "Show top 5 highest paid employees with their department info"
  • "Find employees working on multiple active projects"
  • "Compare sales performance by region for this year"

Advanced Patterns

  • "Which departments have budget exceeding the average?"
  • "Show project timeline with employee assignments"
  • "Analyze salary distribution across departments"

πŸ”§ Configuration

Environment Variables

OPENAI_API_KEY=your_openai_api_key_here
DATABASE_URL=postgresql://user:password@localhost:5432/texttosql_db
REDIS_URL=redis://localhost:6379
CHROMA_PERSIST_DIRECTORY=./chroma_db
LOG_LEVEL=INFO
MAX_QUERY_COMPLEXITY=10
QUERY_TIMEOUT_SECONDS=30
ENABLE_QUERY_CACHING=true
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

Database Configuration

The system supports both SQLite (default) and PostgreSQL:

  • SQLite: Zero configuration, perfect for development
  • PostgreSQL: Production-ready with advanced features

πŸ“Š Sample Data

The system includes a comprehensive sample database with:

  • Employees table with HR data
  • Departments with budget information
  • Projects with timeline and status
  • Sales data with regional breakdown
  • Relationships between entities

πŸ› οΈ Advanced Features

RAG System

  • Embedding-based retrieval using Sentence Transformers
  • Context-aware query generation
  • Few-shot learning with example queries
  • Schema documentation integration

LangGraph Workflow

retrieve_context β†’ generate_sql β†’ validate_sql β†’ optimize_query β†’ execute_query
                    ↓              ↓             ↓
                  retry_logic β†’ error_handling β†’ result_processing

Query Optimization

  • Automatic LIMIT addition for performance
  • Index suggestion based on query patterns
  • Redundant condition removal
  • Join optimization hints

Security Measures

  • Whitelist-based SQL validation
  • Injection attack prevention
  • Query complexity limits
  • Execution timeout controls

πŸ” API Documentation

Core Endpoints

  • GET /health - System health check
  • POST /query - Execute natural language query
  • GET /schema - Database schema information
  • GET /tables - List all tables
  • GET /query-history - Query execution history
  • WS /ws/query - Real-time query execution

Example Request

POST /query
{
  "question": "Show top 5 employees by salary",
  "include_explanation": true,
  "max_results": 100
}

Example Response

{
  "sql_query": "SELECT * FROM employees ORDER BY salary DESC LIMIT 5",
  "results": [...],
  "explanation": "This query selects all columns from the employees table...",
  "confidence_score": 0.95,
  "execution_time": 0.123,
  "metadata": {
    "complexity": "simple",
    "validation_passed": true,
    "optimization_applied": true
  }
}

🎨 UI Components

Query Interface

  • Natural language input with autocomplete
  • Real-time execution with progress tracking
  • Result visualization with charts and tables
  • SQL query display with syntax highlighting

Schema Explorer

  • Interactive table browser
  • Column details with types and constraints
  • Sample data preview
  • Relationship visualization

Analytics Dashboard

  • Query performance metrics
  • Success rate tracking
  • Error analysis and debugging
  • Usage patterns and trends

πŸš€ Advanced Suggestions

1. Custom Schema Integration

# Add your own database schema
await db_manager.add_custom_schema({
    "your_table": {
        "columns": [...],
        "relationships": [...],
        "sample_queries": [...]
    }
})

2. Extend RAG Context

# Add domain-specific examples
rag_system.add_examples([
    {
        "question": "Your domain question",
        "sql": "SELECT ...",
        "explanation": "Domain-specific explanation"
    }
])

3. Custom Validators

# Add business logic validation
class CustomValidator(SQLValidator):
    def validate_business_rules(self, query):
        # Your custom validation logic
        pass

4. Performance Monitoring

# Add custom metrics
@app.middleware("http")
async def add_metrics(request, call_next):
    # Custom monitoring logic
    pass

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • LangChain for the RAG framework
  • LangGraph for workflow orchestration
  • OpenAI for language model capabilities
  • Material-UI for the beautiful interface
  • FastAPI for the high-performance backend

Built with ❀️ for the AI and Data Science community

About

πŸš€ Advanced Text-to-SQL RAG System with LangChain, LangGraph & React - Convert natural language to SQL with AI-powered intelligence

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •