-
-
Notifications
You must be signed in to change notification settings - Fork 108
05 Performance Optimization
Guide to optimizing MCP Memory Service for maximum performance and scalability.
- Quick Wins
- Database Optimization
- Query Performance
- Memory Management
- Monitoring & Metrics
- Troubleshooting Performance Issues
Backend | Read Time | Use Case | Pros | Cons |
---|---|---|---|---|
Hybrid ⚡ | ~5ms | Production (Recommended) | Best of both worlds, Fast reads, Cloud sync | Requires Cloudflare config |
SQLite-vec | ~5ms | Development, Single-user | Lightning fast, No network, No limits | Local only, No sharing |
ChromaDB | ~15ms | Multi-client local | Fast, Multi-client support | More memory usage |
Cloudflare | 50-500ms+ | Legacy cloud-only | Global sync, High availability | Network latency |
Hybrid (SQLite-vec + Cloudflare) ⚡🌟 RECOMMENDED
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_SYNC_INTERVAL=300 # 5 minutes
export MCP_HYBRID_BATCH_SIZE=50
- Performance: ~5ms read time (SQLite-vec speed)
- Write Speed: ~5ms (immediate to SQLite-vec, async to Cloudflare)
- Architecture: Write-through cache with background sync
- Best for: Production environments, multi-device workflows, best user experience
-
Benefits:
- ✅ Lightning-fast operations - All reads/writes use SQLite-vec
- ✅ Cloud persistence - Automatic background sync to Cloudflare
- ✅ Multi-device sync - Access memories across all devices
- ✅ Graceful degradation - Works offline, syncs when online
- ✅ Zero user-facing latency - Cloud operations happen in background
- Requirements: Cloudflare credentials (falls back to SQLite-only if missing)
-
Configuration Options:
# Sync timing MCP_HYBRID_SYNC_INTERVAL=300 # Background sync every 5 minutes MCP_HYBRID_BATCH_SIZE=50 # Sync 50 operations at a time MCP_HYBRID_MAX_QUEUE_SIZE=1000 # Maximum pending operations # Health monitoring MCP_HYBRID_ENABLE_HEALTH_CHECKS=true MCP_HYBRID_HEALTH_CHECK_INTERVAL=60 MCP_HYBRID_SYNC_ON_STARTUP=true # Fallback behavior MCP_HYBRID_FALLBACK_TO_PRIMARY=true MCP_HYBRID_WARN_ON_SECONDARY_FAILURE=true
SQLite-vec (Local Storage) 🏃♂️💨
export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
- Performance: ~5ms average read time
- Latency: Zero network latency - direct disk I/O
- Throughput: Unlimited operations (no rate limits)
- Best for: Development, testing, speed-critical applications, offline usage
- Limitations: Single machine only, manual backup required
ChromaDB (Local Multi-client) 🔄
export MCP_MEMORY_STORAGE_BACKEND=chroma
- Performance: ~15ms average read time
- Latency: Low - local HTTP API calls
- Throughput: High with connection pooling
- Best for: Team development, local multi-client scenarios
- Limitations: Higher memory usage, single machine
Cloudflare (Cloud Storage) 🌐
export MCP_MEMORY_STORAGE_BACKEND=cloudflare
- Performance: 50-500ms+ (network dependent)
- Latency: Variable based on geographic distance to edge
- Throughput: Limited by API rate limits (generous but present)
- Architecture: Multiple API calls required (D1 + Vectorize)
- Best for: Production, multi-device sync, team sharing, automatic backups
- Limitations: Network dependency, higher latency, API costs, service limits (see below)
Network Latency Components:
- Geographic distance to nearest Cloudflare edge
- API request/response overhead
- Multiple service coordination (D1 database + Vectorize embeddings)
- Internet connection quality and stability
Optimization Tips:
- Use regions closest to your location
- Implement client-side caching for frequently accessed memories
- Batch operations when possible
- Consider hybrid approach (SQLite-vec for speed + Cloudflare for sync)
Unlike SQLite-vec which has unlimited capacity, Cloudflare has strict service limits that can cause sync failures if not handled properly.
Service | Limit | SQLite-vec | Impact |
---|---|---|---|
D1 Database | 10 GB per database | Unlimited | ❌ Hard stop - No more memories can be stored |
Vectorize Index | 5 million vectors | Unlimited | ❌ Hard stop - No more embeddings can be created |
Metadata per vector | 10 KB per entry | Unlimited | ❌ Skip memories with large metadata |
Filter query size | 2 KB per query | Unlimited | ❌ Query failures for complex filters |
String index size | 64 bytes truncated | Unlimited | |
Batch operations | 200,000 vectors max | Unlimited |
Memory Count Estimation:
# Typical memory sizes
average_content_size = 500 # bytes (typical note)
average_metadata_size = 200 # bytes (tags + timestamps)
embedding_size = 384 * 4 # bytes (768 floats for bge-base-en-v1.5)
# D1 storage per memory ≈ 700 bytes
# Vectorize storage per memory ≈ 1.5 KB
# Estimated limits:
max_memories_d1 = 10_GB / 700_bytes ≈ 15_million_memories
max_memories_vectorize = 5_million_vectors # Hard limit
# Practical limit: 5 million memories (Vectorize constraint)
Warning Thresholds (implemented in hybrid backend):
- 80% capacity (4M memories): Warning alerts start
- 95% capacity (4.75M memories): Critical alerts, consider action
- 100% capacity (5M memories): New memories rejected
Cloudflare-only Backend (cloudflare
):
export MCP_MEMORY_STORAGE_BACKEND=cloudflare
❌ High Risk: Hits limits directly
- No local fallback when limits reached
- All operations become slower near limits
- Risk of data loss if limits exceeded unexpectedly
- Manual intervention required to continue service
Hybrid Backend (hybrid
) ⚡ RECOMMENDED:
export MCP_MEMORY_STORAGE_BACKEND=hybrid
✅ Protected: Intelligent limit handling
- Pre-sync validation: Rejects oversized memories before sync
- Capacity monitoring: Real-time tracking with warnings
- Graceful degradation: Continues working locally when cloud limits hit
- Smart error handling: Distinguishes temporary vs permanent failures
- Automatic fallback: Falls back to SQLite-only mode if needed
The hybrid backend includes comprehensive protection against Cloudflare limits:
1. Pre-sync Validation
# Automatically validates before syncing to Cloudflare
if metadata_size > 10_KB:
logger.warning("Memory metadata too large, skipping Cloudflare sync")
# Memory stays in local SQLite-vec only
2. Capacity Monitoring
# Check current capacity usage
claude /memory-health
# Expected output:
# Cloudflare Capacity:
# Vectors: 3.2M / 5M (64% - OK)
# Warnings: None
# Status: Healthy
3. Intelligent Error Handling
# Permanent limit errors → No retry (saves resources)
# Temporary network errors → Exponential backoff retry
# Quota exceeded → Skip and log, continue with local storage
4. Configuration Options
# Monitoring thresholds (hybrid backend only)
export MCP_CLOUDFLARE_WARNING_THRESHOLD=80 # Warn at 80%
export MCP_CLOUDFLARE_CRITICAL_THRESHOLD=95 # Critical at 95%
# Batch size limits
export MCP_HYBRID_BATCH_SIZE=50 # Conservative batch size
export MCP_HYBRID_MAX_QUEUE_SIZE=1000 # Limit memory usage
Small Scale (< 100K memories):
# Any backend works fine
export MCP_MEMORY_STORAGE_BACKEND=hybrid # Best performance + safety
Medium Scale (100K - 1M memories):
# Hybrid recommended for performance + limit protection
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_ENABLE_HEALTH_CHECKS=true
Large Scale (1M - 4M memories):
# Hybrid with monitoring essential
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_SYNC_INTERVAL=600 # Longer intervals
export MCP_HYBRID_BATCH_SIZE=25 # Smaller batches
export MCP_CLOUDFLARE_WARNING_THRESHOLD=70 # Earlier warnings
Enterprise Scale (4M+ memories):
# Approaching Cloudflare limits - monitor closely
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_SYNC_INTERVAL=900 # Conservative sync
export MCP_HYBRID_BATCH_SIZE=10 # Small batches
export MCP_CLOUDFLARE_WARNING_THRESHOLD=60 # Early warnings
export MCP_CLOUDFLARE_CRITICAL_THRESHOLD=80 # Early critical alerts
# Consider database partitioning strategies
Built-in Monitoring (hybrid backend):
# Real-time capacity check
curl https://localhost:8443/api/health
# Detailed capacity information
curl https://localhost:8443/api/capacity
# Sync service status
curl https://localhost:8443/api/sync/status
Manual Capacity Checks:
# Cloudflare Dashboard
# → D1 Database → View size
# → Vectorize Index → View vector count
# Or via API
curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account}/d1/database/{db}/stats" \
-H "Authorization: Bearer {token}"
At 80% Capacity (Warning):
- Monitor closely: Check capacity daily
- Optimize data: Remove old/duplicate memories
- Plan migration: Consider multiple Cloudflare accounts or alternative storage
At 95% Capacity (Critical):
- Immediate action required
- Stop non-essential sync: Pause bulk imports
- Archive old data: Move historical memories to separate storage
- Prepare fallback: Ensure hybrid backend can operate in SQLite-only mode
At 100% Capacity (Limit Reached):
- Cloudflare-only: Service becomes read-only
- Hybrid: Continues working locally, stops syncing to cloud
Multiple Cloudflare Accounts:
# Partition by user, team, or time period
export CLOUDFLARE_ACCOUNT_ID_PRIMARY=account1
export CLOUDFLARE_ACCOUNT_ID_ARCHIVE=account2
Tiered Storage Architecture:
# Hot data: Recent memories (< 30 days) → Cloudflare
# Warm data: Older memories (30-365 days) → Archive Cloudflare account
# Cold data: Historical memories (> 1 year) → S3 or similar
Data Lifecycle Management:
# Automatic archiving
export MCP_AUTO_ARCHIVE_DAYS=365
export MCP_ARCHIVE_BACKEND=s3
export MCP_DELETE_ARCHIVED_LOCAL=false # Keep local copies
💡 Key Takeaway: Always use the hybrid backend for production deployments. It provides the performance of SQLite-vec with the persistence of Cloudflare, plus intelligent protection against service limits that can cause data loss or service interruption.
export MCP_HTTP_ENABLED=true
export MCP_HTTPS_ENABLED=true
export MCP_HTTP_PORT=8443
# ❌ Slow: Individual operations
for memory in memories:
await store_memory(memory)
# ✅ Fast: Batch operation
await store_memories_batch(memories)
# Optimize SQLite settings
export SQLITE_PRAGMA_CACHE_SIZE=10000
export SQLITE_PRAGMA_SYNCHRONOUS=NORMAL
export SQLITE_PRAGMA_WAL_AUTOCHECKPOINT=1000
# Optimize ChromaDB settings
chroma_settings = {
"anonymized_telemetry": False,
"allow_reset": False,
"is_persistent": True,
"persist_directory": "/path/to/chroma_db"
}
# SQLite maintenance (weekly)
sqlite3 memory.db "VACUUM;"
sqlite3 memory.db "REINDEX;"
sqlite3 memory.db "ANALYZE;"
# Check database size
sqlite3 memory.db "SELECT page_count * page_size as size FROM pragma_page_count(), pragma_page_size();"
# ❌ Slow: Vague search
results = await search("thing")
# ✅ Fast: Specific search
results = await search("authentication JWT token")
# For quick browsing
results = await search(query, limit=10)
# For existence check
exists = len(await search(query, limit=1)) > 0
# For comprehensive analysis
results = await search(query, limit=100)
# Most efficient: Tag search first (indexed)
tagged = await search_by_tag(["python", "error"])
# Then refine with text search
refined = await search("authentication", memories=tagged)
-- Ensure tag indexes exist
CREATE INDEX IF NOT EXISTS idx_memory_tags ON memories(tags);
CREATE INDEX IF NOT EXISTS idx_memory_created_at ON memories(created_at);
CREATE INDEX IF NOT EXISTS idx_memory_content_hash ON memories(content_hash);
# Use full-text search when available
results = await search_fts("authentication error python")
# Fall back to semantic search for complex queries
results = await search_semantic("how to fix JWT timeout issues")
# ❌ Memory intensive
all_memories = await get_all_memories()
filtered = [m for m in all_memories if condition(m)]
# ✅ Stream processing
async for memory in stream_memories():
if condition(memory):
yield memory
# Configure embedding cache
EMBEDDING_CACHE_SIZE = 1000 # Number of embeddings to cache
EMBEDDING_CACHE_TTL = 3600 # Cache TTL in seconds
# Query result caching
QUERY_CACHE_SIZE = 100 # Number of query results to cache
QUERY_CACHE_TTL = 300 # Cache TTL in seconds
# Limit memory usage
export MCP_MAX_MEMORY_MB=2048
# Limit concurrent operations
export MCP_MAX_CONCURRENT_OPERATIONS=10
# Limit embedding batch size
export MCP_EMBEDDING_BATCH_SIZE=50
# Query performance
query_time = time.time()
results = await search(query)
duration = time.time() - query_time
print(f"Query took {duration:.2f}s")
# Memory usage
import psutil
memory_usage = psutil.Process().memory_info().rss / 1024 / 1024
print(f"Memory usage: {memory_usage:.1f}MB")
# Database stats
stats = await get_database_stats()
print(f"Total memories: {stats.count}")
print(f"Database size: {stats.size_mb}MB")
# Health check endpoint
curl https://localhost:8443/api/health
# Stats endpoint
curl https://localhost:8443/api/stats
# Performance metrics
curl https://localhost:8443/api/metrics
# Enable performance logging
export MCP_LOG_LEVEL=INFO
export MCP_LOG_PERFORMANCE=true
# Monitor slow queries
export MCP_SLOW_QUERY_THRESHOLD=1000 # Log queries > 1s
Symptoms: Search takes >2 seconds Diagnosis:
# Check database size
stats = await get_db_stats()
if stats.size_mb > 1000:
print("Large database detected")
# Check index usage
explain_plan = await explain_query(search_query)
if "SCAN" in explain_plan:
print("Full table scan detected")
Solutions:
- Add missing indexes
- Optimize query patterns
- Consider database partitioning
Symptoms: Process using >4GB RAM Diagnosis:
# Check embedding cache
cache_stats = await get_embedding_cache_stats()
print(f"Cache size: {cache_stats.size}")
# Check for memory leaks
memory_trend = get_memory_usage_trend(hours=24)
if memory_trend.slope > 0.1:
print("Potential memory leak")
Solutions:
- Reduce cache sizes
- Enable garbage collection
- Restart service periodically
Symptoms: "Database is locked" errors Diagnosis:
# Check for long-running transactions
sqlite3 memory.db "SELECT * FROM sqlite_master WHERE type='table';"
# Check WAL file size
ls -la *.db-wal
Solutions:
- Enable WAL mode
- Reduce transaction scope
- Add connection pooling
# Benchmark search performance
async def benchmark_search():
queries = ["python", "error", "authentication", "database"]
times = []
for query in queries:
start = time.time()
results = await search(query, limit=10)
duration = time.time() - start
times.append(duration)
print(f"Query '{query}': {duration:.2f}s ({len(results)} results)")
avg_time = sum(times) / len(times)
print(f"Average search time: {avg_time:.2f}s")
# Run benchmark
await benchmark_search()
- Check query response times (<1s average)
- Monitor memory usage (<2GB)
- Verify database health
- Review slow query logs
- Run database VACUUM
- Update query statistics
- Review performance metrics
- Clean up old logs
- Analyze performance trends
- Update optimization settings
- Review capacity planning
- Performance regression testing
- Search queries: <500ms average
- Memory storage: <100ms average
- Health checks: <50ms average
- Bulk operations: <5s for 100 items
- Memory usage: <2GB for 100K memories
- Disk space: <1GB for 100K memories
- CPU usage: <10% average load
- Network: <1MB/s average throughput
Following these optimization guidelines will ensure your MCP Memory Service performs efficiently at any scale.