Skip to content

05 Performance Optimization

Henry edited this page Sep 25, 2025 · 4 revisions

Performance Optimization

Guide to optimizing MCP Memory Service for maximum performance and scalability.

Table of Contents

Quick Wins

1. Choose the Right Backend

Performance Comparison

Backend Read Time Use Case Pros Cons
Hybrid ~5ms Production (Recommended) Best of both worlds, Fast reads, Cloud sync Requires Cloudflare config
SQLite-vec ~5ms Development, Single-user Lightning fast, No network, No limits Local only, No sharing
ChromaDB ~15ms Multi-client local Fast, Multi-client support More memory usage
Cloudflare 50-500ms+ Legacy cloud-only Global sync, High availability Network latency

Detailed Backend Analysis

Hybrid (SQLite-vec + Cloudflare) ⚡🌟 RECOMMENDED

export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_SYNC_INTERVAL=300  # 5 minutes
export MCP_HYBRID_BATCH_SIZE=50
  • Performance: ~5ms read time (SQLite-vec speed)
  • Write Speed: ~5ms (immediate to SQLite-vec, async to Cloudflare)
  • Architecture: Write-through cache with background sync
  • Best for: Production environments, multi-device workflows, best user experience
  • Benefits:
    • Lightning-fast operations - All reads/writes use SQLite-vec
    • Cloud persistence - Automatic background sync to Cloudflare
    • Multi-device sync - Access memories across all devices
    • Graceful degradation - Works offline, syncs when online
    • Zero user-facing latency - Cloud operations happen in background
  • Requirements: Cloudflare credentials (falls back to SQLite-only if missing)
  • Configuration Options:
    # Sync timing
    MCP_HYBRID_SYNC_INTERVAL=300        # Background sync every 5 minutes
    MCP_HYBRID_BATCH_SIZE=50            # Sync 50 operations at a time
    MCP_HYBRID_MAX_QUEUE_SIZE=1000      # Maximum pending operations
    
    # Health monitoring
    MCP_HYBRID_ENABLE_HEALTH_CHECKS=true
    MCP_HYBRID_HEALTH_CHECK_INTERVAL=60
    MCP_HYBRID_SYNC_ON_STARTUP=true
    
    # Fallback behavior
    MCP_HYBRID_FALLBACK_TO_PRIMARY=true
    MCP_HYBRID_WARN_ON_SECONDARY_FAILURE=true

SQLite-vec (Local Storage) 🏃‍♂️💨

export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
  • Performance: ~5ms average read time
  • Latency: Zero network latency - direct disk I/O
  • Throughput: Unlimited operations (no rate limits)
  • Best for: Development, testing, speed-critical applications, offline usage
  • Limitations: Single machine only, manual backup required

ChromaDB (Local Multi-client) 🔄

export MCP_MEMORY_STORAGE_BACKEND=chroma
  • Performance: ~15ms average read time
  • Latency: Low - local HTTP API calls
  • Throughput: High with connection pooling
  • Best for: Team development, local multi-client scenarios
  • Limitations: Higher memory usage, single machine

Cloudflare (Cloud Storage) 🌐

export MCP_MEMORY_STORAGE_BACKEND=cloudflare
  • Performance: 50-500ms+ (network dependent)
  • Latency: Variable based on geographic distance to edge
  • Throughput: Limited by API rate limits (generous but present)
  • Architecture: Multiple API calls required (D1 + Vectorize)
  • Best for: Production, multi-device sync, team sharing, automatic backups
  • Limitations: Network dependency, higher latency, API costs, service limits (see below)

Performance Factors for Cloudflare

Network Latency Components:

  • Geographic distance to nearest Cloudflare edge
  • API request/response overhead
  • Multiple service coordination (D1 database + Vectorize embeddings)
  • Internet connection quality and stability

Optimization Tips:

  • Use regions closest to your location
  • Implement client-side caching for frequently accessed memories
  • Batch operations when possible
  • Consider hybrid approach (SQLite-vec for speed + Cloudflare for sync)

⚠️ Critical: Cloudflare Service Limitations

Unlike SQLite-vec which has unlimited capacity, Cloudflare has strict service limits that can cause sync failures if not handled properly.

Hard Service Limits

Service Limit SQLite-vec Impact
D1 Database 10 GB per database Unlimited Hard stop - No more memories can be stored
Vectorize Index 5 million vectors Unlimited Hard stop - No more embeddings can be created
Metadata per vector 10 KB per entry Unlimited Skip memories with large metadata
Filter query size 2 KB per query Unlimited Query failures for complex filters
String index size 64 bytes truncated Unlimited ⚠️ Data truncation in search metadata
Batch operations 200,000 vectors max Unlimited ⚠️ Slower bulk imports

Capacity Planning Guidelines

Memory Count Estimation:

# Typical memory sizes
average_content_size = 500    # bytes (typical note)
average_metadata_size = 200   # bytes (tags + timestamps)
embedding_size = 384 * 4      # bytes (768 floats for bge-base-en-v1.5)

# D1 storage per memory ≈ 700 bytes
# Vectorize storage per memory ≈ 1.5 KB

# Estimated limits:
max_memories_d1 = 10_GB / 700_bytes15_million_memories
max_memories_vectorize = 5_million_vectors  # Hard limit

# Practical limit: 5 million memories (Vectorize constraint)

Warning Thresholds (implemented in hybrid backend):

  • 80% capacity (4M memories): Warning alerts start
  • 95% capacity (4.75M memories): Critical alerts, consider action
  • 100% capacity (5M memories): New memories rejected

Impact on Different Backends

Cloudflare-only Backend (cloudflare):

export MCP_MEMORY_STORAGE_BACKEND=cloudflare

High Risk: Hits limits directly

  • No local fallback when limits reached
  • All operations become slower near limits
  • Risk of data loss if limits exceeded unexpectedly
  • Manual intervention required to continue service

Hybrid Backend (hybrid) ⚡ RECOMMENDED:

export MCP_MEMORY_STORAGE_BACKEND=hybrid

Protected: Intelligent limit handling

  • Pre-sync validation: Rejects oversized memories before sync
  • Capacity monitoring: Real-time tracking with warnings
  • Graceful degradation: Continues working locally when cloud limits hit
  • Smart error handling: Distinguishes temporary vs permanent failures
  • Automatic fallback: Falls back to SQLite-only mode if needed

Limit Protection Features (Hybrid Backend)

The hybrid backend includes comprehensive protection against Cloudflare limits:

1. Pre-sync Validation

# Automatically validates before syncing to Cloudflare
if metadata_size > 10_KB:
    logger.warning("Memory metadata too large, skipping Cloudflare sync")
    # Memory stays in local SQLite-vec only

2. Capacity Monitoring

# Check current capacity usage
claude /memory-health

# Expected output:
# Cloudflare Capacity:
#   Vectors: 3.2M / 5M (64% - OK)
#   Warnings: None
#   Status: Healthy

3. Intelligent Error Handling

# Permanent limit errors → No retry (saves resources)
# Temporary network errors → Exponential backoff retry
# Quota exceeded → Skip and log, continue with local storage

4. Configuration Options

# Monitoring thresholds (hybrid backend only)
export MCP_CLOUDFLARE_WARNING_THRESHOLD=80   # Warn at 80%
export MCP_CLOUDFLARE_CRITICAL_THRESHOLD=95  # Critical at 95%

# Batch size limits
export MCP_HYBRID_BATCH_SIZE=50              # Conservative batch size
export MCP_HYBRID_MAX_QUEUE_SIZE=1000        # Limit memory usage

Recommendations by Usage Scale

Small Scale (< 100K memories):

# Any backend works fine
export MCP_MEMORY_STORAGE_BACKEND=hybrid  # Best performance + safety

Medium Scale (100K - 1M memories):

# Hybrid recommended for performance + limit protection
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_ENABLE_HEALTH_CHECKS=true

Large Scale (1M - 4M memories):

# Hybrid with monitoring essential
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_SYNC_INTERVAL=600         # Longer intervals
export MCP_HYBRID_BATCH_SIZE=25             # Smaller batches
export MCP_CLOUDFLARE_WARNING_THRESHOLD=70  # Earlier warnings

Enterprise Scale (4M+ memories):

# Approaching Cloudflare limits - monitor closely
export MCP_MEMORY_STORAGE_BACKEND=hybrid
export MCP_HYBRID_SYNC_INTERVAL=900         # Conservative sync
export MCP_HYBRID_BATCH_SIZE=10             # Small batches
export MCP_CLOUDFLARE_WARNING_THRESHOLD=60  # Early warnings
export MCP_CLOUDFLARE_CRITICAL_THRESHOLD=80 # Early critical alerts

# Consider database partitioning strategies

Monitoring Cloudflare Limits

Built-in Monitoring (hybrid backend):

# Real-time capacity check
curl https://localhost:8443/api/health

# Detailed capacity information
curl https://localhost:8443/api/capacity

# Sync service status
curl https://localhost:8443/api/sync/status

Manual Capacity Checks:

# Cloudflare Dashboard
# → D1 Database → View size
# → Vectorize Index → View vector count

# Or via API
curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account}/d1/database/{db}/stats" \
  -H "Authorization: Bearer {token}"

What to Do When Approaching Limits

At 80% Capacity (Warning):

  1. Monitor closely: Check capacity daily
  2. Optimize data: Remove old/duplicate memories
  3. Plan migration: Consider multiple Cloudflare accounts or alternative storage

At 95% Capacity (Critical):

  1. Immediate action required
  2. Stop non-essential sync: Pause bulk imports
  3. Archive old data: Move historical memories to separate storage
  4. Prepare fallback: Ensure hybrid backend can operate in SQLite-only mode

At 100% Capacity (Limit Reached):

  • Cloudflare-only: Service becomes read-only
  • Hybrid: Continues working locally, stops syncing to cloud

Alternative Strategies for Scale

Multiple Cloudflare Accounts:

# Partition by user, team, or time period
export CLOUDFLARE_ACCOUNT_ID_PRIMARY=account1
export CLOUDFLARE_ACCOUNT_ID_ARCHIVE=account2

Tiered Storage Architecture:

# Hot data: Recent memories (< 30 days) → Cloudflare
# Warm data: Older memories (30-365 days) → Archive Cloudflare account
# Cold data: Historical memories (> 1 year) → S3 or similar

Data Lifecycle Management:

# Automatic archiving
export MCP_AUTO_ARCHIVE_DAYS=365
export MCP_ARCHIVE_BACKEND=s3
export MCP_DELETE_ARCHIVED_LOCAL=false  # Keep local copies

💡 Key Takeaway: Always use the hybrid backend for production deployments. It provides the performance of SQLite-vec with the persistence of Cloudflare, plus intelligent protection against service limits that can cause data loss or service interruption.

2. Enable HTTP/HTTPS for Better Performance

export MCP_HTTP_ENABLED=true
export MCP_HTTPS_ENABLED=true
export MCP_HTTP_PORT=8443

3. Use Batch Operations

# ❌ Slow: Individual operations
for memory in memories:
    await store_memory(memory)

# ✅ Fast: Batch operation
await store_memories_batch(memories)

Database Optimization

SQLite-vec Configuration

# Optimize SQLite settings
export SQLITE_PRAGMA_CACHE_SIZE=10000
export SQLITE_PRAGMA_SYNCHRONOUS=NORMAL
export SQLITE_PRAGMA_WAL_AUTOCHECKPOINT=1000

ChromaDB Configuration

# Optimize ChromaDB settings
chroma_settings = {
    "anonymized_telemetry": False,
    "allow_reset": False,
    "is_persistent": True,
    "persist_directory": "/path/to/chroma_db"
}

Database Maintenance

# SQLite maintenance (weekly)
sqlite3 memory.db "VACUUM;"
sqlite3 memory.db "REINDEX;"
sqlite3 memory.db "ANALYZE;"

# Check database size
sqlite3 memory.db "SELECT page_count * page_size as size FROM pragma_page_count(), pragma_page_size();"

Query Performance

Optimize Search Queries

Use Specific Keywords

# ❌ Slow: Vague search
results = await search("thing")

# ✅ Fast: Specific search
results = await search("authentication JWT token")

Limit Results Appropriately

# For quick browsing
results = await search(query, limit=10)

# For existence check
exists = len(await search(query, limit=1)) > 0

# For comprehensive analysis
results = await search(query, limit=100)

Combine Search Types

# Most efficient: Tag search first (indexed)
tagged = await search_by_tag(["python", "error"])

# Then refine with text search
refined = await search("authentication", memories=tagged)

Index Optimization

Tag Indexing

-- Ensure tag indexes exist
CREATE INDEX IF NOT EXISTS idx_memory_tags ON memories(tags);
CREATE INDEX IF NOT EXISTS idx_memory_created_at ON memories(created_at);
CREATE INDEX IF NOT EXISTS idx_memory_content_hash ON memories(content_hash);

Content Indexing

# Use full-text search when available
results = await search_fts("authentication error python")

# Fall back to semantic search for complex queries
results = await search_semantic("how to fix JWT timeout issues")

Memory Management

Memory Usage Patterns

Efficient Memory Allocation

# ❌ Memory intensive
all_memories = await get_all_memories()
filtered = [m for m in all_memories if condition(m)]

# ✅ Stream processing
async for memory in stream_memories():
    if condition(memory):
        yield memory

Cache Management

# Configure embedding cache
EMBEDDING_CACHE_SIZE = 1000  # Number of embeddings to cache
EMBEDDING_CACHE_TTL = 3600   # Cache TTL in seconds

# Query result caching
QUERY_CACHE_SIZE = 100       # Number of query results to cache
QUERY_CACHE_TTL = 300        # Cache TTL in seconds

Resource Limits

# Limit memory usage
export MCP_MAX_MEMORY_MB=2048

# Limit concurrent operations
export MCP_MAX_CONCURRENT_OPERATIONS=10

# Limit embedding batch size
export MCP_EMBEDDING_BATCH_SIZE=50

Monitoring & Metrics

Performance Metrics to Track

# Query performance
query_time = time.time()
results = await search(query)
duration = time.time() - query_time
print(f"Query took {duration:.2f}s")

# Memory usage
import psutil
memory_usage = psutil.Process().memory_info().rss / 1024 / 1024
print(f"Memory usage: {memory_usage:.1f}MB")

# Database stats
stats = await get_database_stats()
print(f"Total memories: {stats.count}")
print(f"Database size: {stats.size_mb}MB")

Built-in Performance Tools

# Health check endpoint
curl https://localhost:8443/api/health

# Stats endpoint
curl https://localhost:8443/api/stats

# Performance metrics
curl https://localhost:8443/api/metrics

Logging Configuration

# Enable performance logging
export MCP_LOG_LEVEL=INFO
export MCP_LOG_PERFORMANCE=true

# Monitor slow queries
export MCP_SLOW_QUERY_THRESHOLD=1000  # Log queries > 1s

Troubleshooting Performance Issues

Common Performance Problems

Slow Search Queries

Symptoms: Search takes >2 seconds Diagnosis:

# Check database size
stats = await get_db_stats()
if stats.size_mb > 1000:
    print("Large database detected")

# Check index usage
explain_plan = await explain_query(search_query)
if "SCAN" in explain_plan:
    print("Full table scan detected")

Solutions:

  1. Add missing indexes
  2. Optimize query patterns
  3. Consider database partitioning

High Memory Usage

Symptoms: Process using >4GB RAM Diagnosis:

# Check embedding cache
cache_stats = await get_embedding_cache_stats()
print(f"Cache size: {cache_stats.size}")

# Check for memory leaks
memory_trend = get_memory_usage_trend(hours=24)
if memory_trend.slope > 0.1:
    print("Potential memory leak")

Solutions:

  1. Reduce cache sizes
  2. Enable garbage collection
  3. Restart service periodically

Database Lock Contention

Symptoms: "Database is locked" errors Diagnosis:

# Check for long-running transactions
sqlite3 memory.db "SELECT * FROM sqlite_master WHERE type='table';"

# Check WAL file size
ls -la *.db-wal

Solutions:

  1. Enable WAL mode
  2. Reduce transaction scope
  3. Add connection pooling

Performance Benchmarking

# Benchmark search performance
async def benchmark_search():
    queries = ["python", "error", "authentication", "database"]
    times = []
    
    for query in queries:
        start = time.time()
        results = await search(query, limit=10)
        duration = time.time() - start
        times.append(duration)
        print(f"Query '{query}': {duration:.2f}s ({len(results)} results)")
    
    avg_time = sum(times) / len(times)
    print(f"Average search time: {avg_time:.2f}s")

# Run benchmark
await benchmark_search()

Optimization Checklist

Daily Monitoring

  • Check query response times (<1s average)
  • Monitor memory usage (<2GB)
  • Verify database health
  • Review slow query logs

Weekly Maintenance

  • Run database VACUUM
  • Update query statistics
  • Review performance metrics
  • Clean up old logs

Monthly Review

  • Analyze performance trends
  • Update optimization settings
  • Review capacity planning
  • Performance regression testing

Performance Targets

Response Time Goals

  • Search queries: <500ms average
  • Memory storage: <100ms average
  • Health checks: <50ms average
  • Bulk operations: <5s for 100 items

Resource Usage Goals

  • Memory usage: <2GB for 100K memories
  • Disk space: <1GB for 100K memories
  • CPU usage: <10% average load
  • Network: <1MB/s average throughput

Following these optimization guidelines will ensure your MCP Memory Service performs efficiently at any scale.

Clone this wiki locally