Skip to content

Backend Synchronization Guide

doobidoo edited this page Sep 22, 2025 · 1 revision

Backend Synchronization Guide

Overview

This guide covers the bidirectional synchronization capabilities between Cloudflare and SQLite-vec backends in MCP Memory Service. These tools enable hybrid deployment strategies combining the speed of local storage with the global availability of cloud storage.

Architecture

┌─────────────────────────────────────────┐
│         MCP Memory Service              │
│                                         │
│  ┌─────────────┐    ┌─────────────┐   │
│  │ SQLite-vec  │←→│ Sync Engine  │   │
│  │  (Local)    │    │             │   │
│  └─────────────┘    └─────────────┘   │
│         ↑                  ↓           │
│         │                  │           │
│     Fast Access      Bidirectional     │
│    (5ms reads)          Sync           │
│         │                  │           │
│         ↓                  ↓           │
│  ┌─────────────────────────────────┐   │
│  │      Cloudflare Backend         │   │
│  │  (D1 + Vectorize + R2)         │   │
│  │     Global Distribution        │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Use Cases

1. Hybrid Cloud/Local Deployment

  • Primary: Cloudflare for global access
  • Backup: Local SQLite-vec for offline capability
  • Benefit: Resilient, always-available memory service

2. Development/Production Sync

  • Development: Local SQLite-vec for fast iteration
  • Production: Cloudflare for scalability
  • Benefit: Seamless dev→prod workflow

3. Multi-Machine Memory Sharing

  • Machine A: Local work with periodic sync
  • Machine B: Pull shared memories from cloud
  • Benefit: Consistent memory across devices

4. Disaster Recovery

  • Regular backups: Automated Cloudflare→SQLite sync
  • Recovery: Quick restore from local backup
  • Benefit: Zero data loss, minimal downtime

Installation

Prerequisites

  1. Both backends configured:

    • SQLite-vec: Default local storage
    • Cloudflare: Requires API token and resource IDs
  2. Environment files:

    # .env (Cloudflare configuration)
    CLOUDFLARE_API_TOKEN=your-token
    CLOUDFLARE_ACCOUNT_ID=your-account
    CLOUDFLARE_D1_DATABASE_ID=your-d1-id
    CLOUDFLARE_VECTORIZE_INDEX=your-index
    MCP_MEMORY_STORAGE_BACKEND=cloudflare
    
    # .env.sqlite (SQLite configuration)
    MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
    MCP_MEMORY_SQLITE_PATH=/path/to/sqlite_vec.db

Sync Tools Setup

The sync utilities are located in the scripts/ directory:

# Navigate to project root
cd mcp-memory-service

# Verify sync tools are present
ls scripts/sync_memory_backends.py
ls scripts/claude_sync_commands.py
ls scripts/memory_service_manager.sh

Basic Usage

Check Sync Status

# Using main sync script
python scripts/sync_memory_backends.py --status

# Using convenience wrapper
python scripts/claude_sync_commands.py status

# Example output:
# === Memory Sync Status ===
# Cloudflare memories: 750
# SQLite-vec memories: 745
# Cloudflare configured: True
# SQLite-vec file exists: True
# Last check: 2024-01-15T10:30:00

Preview Changes (Dry Run)

Always preview before syncing:

# See what would be synced
python scripts/sync_memory_backends.py --dry-run

# Output shows:
# - Memories to add
# - Memories to skip (duplicates)
# - No actual changes made

Sync Operations

Cloudflare → SQLite (Backup)

# Backup cloud memories to local
python scripts/sync_memory_backends.py --direction cf-to-sqlite

# Or using wrapper
python scripts/claude_sync_commands.py backup

SQLite → Cloudflare (Restore)

# Restore local memories to cloud
python scripts/sync_memory_backends.py --direction sqlite-to-cf

# Or using wrapper
python scripts/claude_sync_commands.py restore

Bidirectional Sync

# Sync both directions (merge)
python scripts/sync_memory_backends.py --direction bidirectional

# Or using wrapper
python scripts/claude_sync_commands.py sync

Advanced Usage

Service Management (Linux)

The memory_service_manager.sh script provides comprehensive service management:

# Start with specific backend
./scripts/memory_service_manager.sh start-cloudflare
./scripts/memory_service_manager.sh start-sqlite

# Check status
./scripts/memory_service_manager.sh status

# Integrated sync operations
./scripts/memory_service_manager.sh sync-backup
./scripts/memory_service_manager.sh sync-restore
./scripts/memory_service_manager.sh sync-both

# Stop service
./scripts/memory_service_manager.sh stop

Automated Sync with Cron

Set up automated daily backups:

# Edit crontab
crontab -e

# Add daily backup at 2 AM
0 2 * * * cd /path/to/mcp-memory-service && python scripts/sync_memory_backends.py --direction cf-to-sqlite >> /var/log/memory-sync.log 2>&1

# Add hourly bidirectional sync
0 * * * * cd /path/to/mcp-memory-service && python scripts/sync_memory_backends.py --direction bidirectional >> /var/log/memory-sync.log 2>&1

Custom Database Paths

# Specify custom SQLite path
python scripts/sync_memory_backends.py --sqlite-path /custom/path/backup.db --status

# Verbose logging
python scripts/sync_memory_backends.py --verbose --direction bidirectional

Sync Algorithm

Deduplication Strategy

The sync engine uses content-based hashing to prevent duplicates:

  1. Content Hash: SHA256 hash of content + metadata
  2. Comparison: Check both backends for existing hashes
  3. Skip Logic: Skip if hash exists in target backend
  4. Metadata Preservation: All tags, timestamps preserved

Conflict Resolution

  • No conflicts: Different memories are merged
  • Duplicates: Skipped based on content hash
  • Timestamps: Preserved from original creation
  • Tags: All tags maintained

Performance Considerations

  • Batch Processing: Memories synced in batches
  • Caching: Embedding cache for efficiency
  • Network Optimization: Minimal API calls
  • Large Datasets: Handles 1000+ memories efficiently

Configuration Validation

Before syncing, validate your configuration:

# Run configuration validator
python scripts/validate_config.py

# Example output:
# 🔍 MCP Memory Service Configuration Validation
# ==================================================
#
# 1. Environment Configuration Check:
#    ✅ .env file has Cloudflare backend configured
#
# 2. Claude Code Global Configuration Check:
#    ✅ Found 1 Cloudflare memory configurations
#
# 3. Project-Level Configuration Check:
#    ✅ No local .mcp.json found (good - using global configuration)
#
# 4. Cloudflare Credentials Check:
#    ✅ All required Cloudflare environment variables found in .env
#
# 🎉 Configuration validation PASSED!

Troubleshooting

Common Issues

1. Sync Shows 0 Memories

Problem: One backend not initialized Solution:

# Initialize both backends
MCP_MEMORY_STORAGE_BACKEND=cloudflare uv run memory server &
sleep 5 && kill %1

MCP_MEMORY_STORAGE_BACKEND=sqlite_vec uv run memory server &
sleep 5 && kill %1

2. Authentication Errors

Problem: Invalid Cloudflare credentials Solution:

# Verify credentials in .env
cat .env | grep CLOUDFLARE

# Test with curl
curl -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/d1/database

3. Duplicate Memories After Sync

Problem: Content hash mismatch Solution:

# Run deduplication on SQLite
python scripts/find_duplicates.py --execute

# Verify sync status
python scripts/sync_memory_backends.py --status

4. Slow Sync Performance

Problem: Large dataset without optimization Solution:

# Use verbose mode to identify bottlenecks
python scripts/sync_memory_backends.py --verbose --dry-run

# Consider splitting large syncs
# First sync recent memories only

Best Practices

1. Regular Backups

  • Daily Cloudflare → SQLite backup
  • Weekly full bidirectional sync
  • Monthly verification of sync integrity

2. Pre-Sync Validation

  • Always run --dry-run first
  • Check sync status before major operations
  • Validate configuration regularly

3. Monitoring

  • Log sync operations
  • Monitor memory counts
  • Track sync duration trends

4. Testing

  • Test sync on development data first
  • Verify metadata preservation
  • Confirm search functionality post-sync

Integration with CI/CD

GitHub Actions Example

name: Memory Sync

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours
  workflow_dispatch:

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install uv
          uv pip install -e .

      - name: Sync memories
        env:
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          CLOUDFLARE_D1_DATABASE_ID: ${{ secrets.CLOUDFLARE_D1_DATABASE_ID }}
          CLOUDFLARE_VECTORIZE_INDEX: ${{ secrets.CLOUDFLARE_VECTORIZE_INDEX }}
        run: |
          python scripts/sync_memory_backends.py --direction bidirectional

Future Enhancements

Planned improvements for sync functionality:

  1. Selective Sync: Sync by tags or time ranges
  2. Incremental Sync: Only sync changes since last run
  3. Conflict Resolution UI: Interactive merge tool
  4. Multi-Backend Support: Sync with ChromaDB, PostgreSQL
  5. Compression: Reduce bandwidth for large syncs
  6. Encryption: End-to-end encryption for sensitive memories

Related Documentation

Support

For issues with sync functionality:

  1. Check configuration with python scripts/validate_config.py
  2. Review logs in /tmp/memory-sync.log
  3. Open an issue with sync debug output
  4. Join discussions for community support

Last updated: January 2025 Sync tools version: 1.0.0

Clone this wiki locally