Skip to content

alicoding/semantic-search-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆ    β–ˆβ–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  β–ˆβ–ˆβ–ˆ    β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 
 β–ˆβ–ˆ      β–ˆβ–ˆ      β–ˆβ–ˆβ–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆ   β–ˆβ–ˆ    β–ˆβ–ˆ    β–ˆβ–ˆ β–ˆβ–ˆ      
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ  β–ˆβ–ˆ    β–ˆβ–ˆ    β–ˆβ–ˆ β–ˆβ–ˆ      
      β–ˆβ–ˆ β–ˆβ–ˆ      β–ˆβ–ˆ  β–ˆβ–ˆ  β–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ  β–ˆβ–ˆ β–ˆβ–ˆ    β–ˆβ–ˆ    β–ˆβ–ˆ β–ˆβ–ˆ      
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ      β–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆβ–ˆβ–ˆ    β–ˆβ–ˆ    β–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 
                                                                   
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ                  
 β–ˆβ–ˆ      β–ˆβ–ˆ      β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ      β–ˆβ–ˆ   β–ˆβ–ˆ                  
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  β–ˆβ–ˆ      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  
      β–ˆβ–ˆ β–ˆβ–ˆ      β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ      β–ˆβ–ˆ   β–ˆβ–ˆ                  
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ   β–ˆβ–ˆ                  

Semantic Search Service

Enterprise-grade semantic search powered by LlamaIndex PropertyGraphIndex with TRUE 95/5 architecture.

837 lines of code. 25 Python modules. One unified intelligence layer.

GitHub Issues Production Ready

🎯 What This Actually Does

Semantic Search Service is a complete intelligence layer for your development workflow:

  • πŸ” Semantic Code Search - Search your codebase semantically, not just text matching
  • 🧠 Conversation Memory - Index and search your Claude/AI conversations
  • πŸ“Š Knowledge Graphs - PropertyGraphIndex creates entity relationships from your code
  • πŸ”„ Business Logic Extraction - Automatically extract business rules and workflows
  • ⚑ Real-time Integrations - Sub-100ms responses for tools like temporal-hooks and task-enforcer
  • 🎨 Auto-documentation - Generate API docs and diagrams automatically
  • 🌐 Multiple Interfaces - FastAPI REST, CLI, and MCP (Model Context Protocol) for Claude

πŸš€ Quick Start

git clone https://github.com/alicoding/semantic-search-service.git
cd semantic-search-service

# Copy and configure
cp .env.example .env
# Add your OPENAI_API_KEY or ELECTRONHUB_API_KEY

# Start everything
./setup.sh

# Index your project
./semantic-search index . my-project

# Search semantically
./semantic-search search "authentication logic" my-project

✨ Core Features

πŸ” Semantic Search

# Index any codebase
./semantic-search index /path/to/project project-name

# Semantic search (not just text matching)
./semantic-search search "error handling patterns" project-name

# Check if components exist (for task-enforcer integration)
./semantic-search exists "AuthService" project-name

# Find SOLID/DRY violations (for temporal-hooks integration)
./semantic-search violations project-name

🧠 Conversation Memory

# Index your Claude conversations
curl -X POST "http://localhost:8000/index/conversations" \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/conversations", "collection": "my-conversations"}'

# Search your conversation history
curl "http://localhost:8000/search/memory?query=authentication&limit=5"

πŸ“Š Knowledge Graphs

# Generate knowledge graph from codebase
curl "http://localhost:8000/graph/my-project"

# Export to NetworkX format
curl "http://localhost:8000/graph/my-project/export"

# Visualize relationships
curl "http://localhost:8000/graph/my-project/visualize"

🎨 Auto-documentation

# Generate API documentation
python src/core/auto_docs.py generate

# Generate sequence diagrams
curl -X POST "http://localhost:8000/diagram/sequence?project=my-project"

# Extract business logic
curl -X POST "http://localhost:8000/extract/business-logic?project=my-project"

πŸ› οΈ Configuration

Everything is configured via config.yaml:

# LLM Configuration - Works with OpenAI, ElectronHub, or Ollama
llm_provider: openai  # or ollama for offline
embed_provider: openai
openai_model: claude-opus-4-1-20250805  # ElectronHub models supported!

# Ollama for offline/enterprise
ollama_model: llama3.1:latest
ollama_base_url: http://localhost:11434

# Performance
num_workers: 4
cache_ttl: 3600

# Vector Store
qdrant_url: http://localhost:6333
redis_host: localhost
redis_enabled: true

πŸ“‘ API Endpoints

Core Search:

  • POST /search/{project} - Semantic search in project
  • POST /index - Index new project
  • GET /exists - Check component existence
  • GET /violations/{project} - Find code violations

Conversation Memory:

  • POST /index/conversations - Index Claude/AI conversations
  • GET /search/memory - Search conversation history

Knowledge Graphs:

  • GET /graph/{project} - Get project knowledge graph
  • GET /graph/{project}/visualize - Generate visualizations

Business Intelligence:

  • POST /extract/business-logic - Extract business rules
  • POST /diagram/sequence - Generate sequence diagrams

Integrations:

  • GET /check/violation - Real-time violation check (temporal-hooks)
  • GET /context/project - Project context (AI agents)

πŸ”Œ Integrations

temporal-hooks Integration

Real-time violation detection during development:

GET /check/violation?action=create-new-service&context=my-project
# Returns violations instantly (<100ms cached)

task-enforcer Integration

Check if components exist before creating tasks:

GET /exists?component=UserService&project=my-app
# Returns: {"exists": true, "confidence": 0.92, "file": "user_service.py"}

Claude MCP Integration

Works directly in Claude Code sessions as MCP tools:

  • search_code - Semantic search in projects
  • check_exists - Component existence checking
  • find_violations - SOLID/DRY violation detection

πŸ—οΈ Architecture

TRUE 95/5 Pattern:

  • 95% LlamaIndex Native - PropertyGraphIndex, StorageContext, Settings
  • 5% Glue Code - Thin wrappers and configuration

Tech Stack:

  • LlamaIndex - PropertyGraphIndex, VectorStoreIndex, query engines
  • Qdrant - Vector database (enterprise-grade)
  • Redis - Sub-100ms caching
  • FastAPI - REST API with auto-generated docs
  • Typer - Rich CLI interface

πŸ“Š Performance

Feature Target Status
Search Response <500ms βœ… <200ms
Violation Check <100ms βœ… Cached
Component Exists <200ms βœ… <100ms
Conversation Search <500ms βœ…
Knowledge Graph <3s βœ…

πŸ”§ Installation

Prerequisites

  • Python 3.8+
  • Docker (for Qdrant)
  • OpenAI API key OR Ollama (offline mode)

Full Setup

# Clone repository
git clone https://github.com/alicoding/semantic-search-service.git
cd semantic-search-service

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Automated setup
./setup.sh

# Test installation
curl http://localhost:8000/docs

Docker Setup

# Start services
docker-compose up -d

# Index sample project
./semantic-search index . sample-project

# Test search
./semantic-search search "FastAPI endpoints" sample-project

πŸ§ͺ Testing Your Conversations

Got Claude conversations? Index and search them:

# If you have claude-parser installed
curl -X POST "http://localhost:8000/index/conversations" \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/conversations", "collection": "my-chats"}'

# Search your conversation history
curl "http://localhost:8000/search/memory?query=how to implement caching&limit=3"

πŸ› Known Issues

We track all issues publicly with proper DoR/DoD:

  • #23 🚨 CRITICAL: Missing health endpoint
  • #24 🚨 CRITICAL: API doesn't initialize LlamaIndex Settings
  • #25 ⚠️ HIGH: Rate limiting crashes git hooks

See PRODUCTION_READINESS_AUDIT.md for complete analysis.

🀝 Contributing

Found a bug? Create an issue with:

  • Clear reproduction steps
  • Expected vs actual behavior
  • Your environment (Ollama/OpenAI, OS, etc.)

We fix real bugs that real users encounter.

πŸ“š Documentation

🎯 Real-World Example

Before: Grep for "authentication" returns 500 text matches
After: Semantic search finds actual auth patterns, business logic, and related components with confidence scores

./semantic-search search "user authentication flow" my-project
# Returns: AuthService.authenticate() method, login flow, JWT handling, etc.
# With semantic understanding, not just text matching

Built with LlamaIndex native patterns. Stop searching. Start finding. 🎯

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •