A sophisticated Text-to-SQL system built with LangChain, LangGraph, and modern web technologies. This system converts natural language questions into SQL queries using Retrieval-Augmented Generation (RAG) with advanced features like self-correction, query optimization, and real-time execution.
- Vector embeddings for schema understanding and query examples
- Semantic search for relevant context retrieval
- Example-based learning with curated query patterns
- Schema-aware query generation
- Multi-step workflow with validation and self-correction
- Automatic retry logic for failed queries
- Query optimization and performance tuning
- Real-time progress tracking via WebSocket
- SQL injection prevention with comprehensive validation
- Query complexity analysis and safety checks
- Execution sandboxing with result limits
- Error handling and user feedback
- Dark theme with Material-UI components
- Real-time query execution with progress indicators
- Interactive data visualization with charts
- Monaco Editor for SQL syntax highlighting
- Responsive design for all devices
- Query history tracking and analysis
- Performance metrics and success rates
- Error analysis and debugging tools
- Database schema explorer
βββ backend/ # FastAPI Backend
β βββ main.py # Application entry point
β βββ database/ # Database management
β βββ rag/ # RAG system implementation
β βββ graph/ # LangGraph workflow
β βββ utils/ # Utilities and validators
βββ frontend/ # React Frontend
β βββ src/
β β βββ components/ # Reusable components
β β βββ pages/ # Page components
β β βββ hooks/ # Custom hooks
β β βββ utils/ # API and utilities
βββ requirements.txt # Python dependencies
- Python 3.9+
- Node.js 16+
- OpenAI API key
- PostgreSQL (optional, SQLite by default)
- Install dependencies:
pip install -r requirements.txt
- Set environment variables:
cp .env.example .env
# Edit .env with your OpenAI API key and database settings
- Run the backend:
cd backend
python main.py
The backend will start at http://localhost:8001
- Install dependencies:
cd frontend
npm install
- Start the development server:
npm start
The frontend will start at http://localhost:3000
- "Show all employees in the engineering department"
- "What is the average salary by department?"
- "List all active projects"
- "Show top 5 highest paid employees with their department info"
- "Find employees working on multiple active projects"
- "Compare sales performance by region for this year"
- "Which departments have budget exceeding the average?"
- "Show project timeline with employee assignments"
- "Analyze salary distribution across departments"
OPENAI_API_KEY=your_openai_api_key_here
DATABASE_URL=postgresql://user:password@localhost:5432/texttosql_db
REDIS_URL=redis://localhost:6379
CHROMA_PERSIST_DIRECTORY=./chroma_db
LOG_LEVEL=INFO
MAX_QUERY_COMPLEXITY=10
QUERY_TIMEOUT_SECONDS=30
ENABLE_QUERY_CACHING=true
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
The system supports both SQLite (default) and PostgreSQL:
- SQLite: Zero configuration, perfect for development
- PostgreSQL: Production-ready with advanced features
The system includes a comprehensive sample database with:
- Employees table with HR data
- Departments with budget information
- Projects with timeline and status
- Sales data with regional breakdown
- Relationships between entities
- Embedding-based retrieval using Sentence Transformers
- Context-aware query generation
- Few-shot learning with example queries
- Schema documentation integration
retrieve_context β generate_sql β validate_sql β optimize_query β execute_query
β β β
retry_logic β error_handling β result_processing
- Automatic LIMIT addition for performance
- Index suggestion based on query patterns
- Redundant condition removal
- Join optimization hints
- Whitelist-based SQL validation
- Injection attack prevention
- Query complexity limits
- Execution timeout controls
GET /health
- System health checkPOST /query
- Execute natural language queryGET /schema
- Database schema informationGET /tables
- List all tablesGET /query-history
- Query execution historyWS /ws/query
- Real-time query execution
POST /query
{
"question": "Show top 5 employees by salary",
"include_explanation": true,
"max_results": 100
}
{
"sql_query": "SELECT * FROM employees ORDER BY salary DESC LIMIT 5",
"results": [...],
"explanation": "This query selects all columns from the employees table...",
"confidence_score": 0.95,
"execution_time": 0.123,
"metadata": {
"complexity": "simple",
"validation_passed": true,
"optimization_applied": true
}
}
- Natural language input with autocomplete
- Real-time execution with progress tracking
- Result visualization with charts and tables
- SQL query display with syntax highlighting
- Interactive table browser
- Column details with types and constraints
- Sample data preview
- Relationship visualization
- Query performance metrics
- Success rate tracking
- Error analysis and debugging
- Usage patterns and trends
# Add your own database schema
await db_manager.add_custom_schema({
"your_table": {
"columns": [...],
"relationships": [...],
"sample_queries": [...]
}
})
# Add domain-specific examples
rag_system.add_examples([
{
"question": "Your domain question",
"sql": "SELECT ...",
"explanation": "Domain-specific explanation"
}
])
# Add business logic validation
class CustomValidator(SQLValidator):
def validate_business_rules(self, query):
# Your custom validation logic
pass
# Add custom metrics
@app.middleware("http")
async def add_metrics(request, call_next):
# Custom monitoring logic
pass
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the RAG framework
- LangGraph for workflow orchestration
- OpenAI for language model capabilities
- Material-UI for the beautiful interface
- FastAPI for the high-performance backend
Built with β€οΈ for the AI and Data Science community