Skip to content

sunitj/bioCurator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioCurator

Memory-augmented multi-agent system for scientific literature curation and analysis.

Overview

BioCurator demonstrates how AI agents can develop domain expertise through collaborative literature analysis, using a sophisticated multi-modal memory architecture and safety-first development approach.

Quick Start

Development Mode (Local Models - Zero Cost)

# Set up development environment with UV
export UV_LINK_MODE=copy
./scripts/setup_venv.sh
source .venv/bin/activate
export APP_MODE=development

# Configure environment (optional - uses defaults if not set)
cp .env.example .env
# Edit .env to set JUPYTER_TOKEN and other configurations

# Run with local models (Ollama) - services start in dependency order
docker-compose -f docker-compose.yml -f docker-compose.development.yml up -d

# Wait for all services to be healthy (takes ~30-60 seconds)
docker-compose ps  # Check status

# Access services (replace localhost with your server IP if remote):
# - BioCurator API: http://localhost:8080/
# - Health Status: http://localhost:8080/health/
# - Neo4j Browser: http://localhost:7474/ (user: neo4j, password: dev_password)
# - Jupyter Lab: http://localhost:8888/ (token: biocurator-dev or JUPYTER_TOKEN)
# - Ollama API: http://localhost:11434/

# Verify system health
curl -s http://localhost:8080/health/ | python -m json.tool

Production Mode (Cloud Models)

# Set up production environment with UV
./scripts/setup_venv.sh
source .venv/bin/activate
export APP_MODE=production

# Run with cloud models
docker-compose -f docker-compose.yml -f docker-compose.production.yml up

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Agent Orchestra                      │
├─────────────────────────────────────────────────────────┤
│  Research    Literature    Deep      Domain    Knowledge │
│  Director      Scout      Reader   Specialist   Weaver  │
└────────────┬────────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────────┐
│                   Safety Controls                        │
│  Circuit Breakers │ Rate Limiting │ Cost Tracking       │
└────────────┬────────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────────┐
│                  Memory Systems                          │
│   Neo4j   │   Qdrant   │  PostgreSQL  │  Redis │ SQLite │
└──────────────────────────────────────────────────────────┘

Key Features

  • Multi-Agent Coordination: Specialized agents for literature discovery, analysis, and synthesis

    • Research Director for workflow orchestration
    • Literature Scout, Deep Reader, Domain Specialist, Knowledge Weaver (future PRs)
    • Async message passing with request/response patterns
    • Persistent task queue with dependency management and retry logic
  • Multi-Modal Memory: Knowledge graph, vector embeddings, episodic memory, and procedural patterns

    • Neo4j knowledge graph with concept relationships
    • Qdrant vector store for semantic search
    • PostgreSQL episodic memory for interaction histories
    • Redis working memory for active contexts
    • InfluxDB time-series metrics (optional)
  • Safety-First Design: Circuit breakers, rate limiting, cost tracking, and anomaly detection

    • Per-agent circuit breakers with configurable thresholds
    • Rate limiting with token bucket algorithm
    • Real-time cost tracking and budget enforcement
    • Behavior monitoring with anomaly detection
    • Comprehensive safety event logging
  • Development Mode: Free local model operation with Ollama (DeepSeek-R1, Llama 3.1, Qwen 2.5)

    • Zero cost budget enforcement
    • Hard guard against cloud model access
    • Local model optimization with quality bridging
  • Production Ready: Cloud model integration with comprehensive monitoring and observability

    • Claude Sonnet 4 and GPT-4o model support
    • Prometheus metrics integration
    • Health monitoring with agent status reporting
    • Auto-scaling and load balancing capabilities

Development

Requirements

  • Python 3.11+
  • UV package manager (installed automatically by setup script)
  • Docker and Docker Compose

Setup

# Automated setup with UV
./scripts/setup_venv.sh
source .venv/bin/activate

# Manual setup alternative
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev]"

Common Commands

# Run tests
make test

# Run linting
make lint

# Format code
make format

# Build containers
make build

# View metrics
curl http://localhost:9090/metrics

# Check health (includes agent status)
curl http://localhost:8080/health

# Run agent workflow examples
python examples/basic_workflow.py     # Basic multi-agent workflow
python examples/safety_demo.py       # Safety controls demonstration

# Agent system health
curl http://localhost:8080/health | jq '.components[] | select(.name | startswith("agent"))'

Documentation

Testing

The project maintains:

  • =70% overall test coverage

  • =85% coverage for safety-critical modules

  • Comprehensive integration tests
  • Performance benchmarks

Troubleshooting

Common Issues

  1. Services fail to start or restart continuously

    • Check Docker logs: docker logs <container-name>
    • Neo4j memory settings require specific format in Docker Compose
    • Ensure all required ports are available: 8080, 7474, 7687, 6333, 5432, 6379, 8086
  2. Application can't connect to databases

    • Verify environment variables are set in docker-compose files
    • Services must use container names (e.g., redis, postgres) not localhost
    • Check that all services are healthy: docker-compose ps
  3. Health endpoint shows "unhealthy" but system works

    • This is expected if optional backends (like InfluxDB) aren't initialized
    • Check individual component status in the health response
    • Only required backends (Redis, PostgreSQL, Neo4j, Qdrant) need to be healthy
  4. Cannot access endpoints from browser (EC2/Remote)

    • Ensure security groups allow inbound traffic on required ports
    • Use server's public IP instead of localhost
    • Consider SSH tunneling for secure development access
  5. Fresh start after issues

    docker-compose down
    docker volume rm $(docker volume ls -q | grep biocurator)  # Removes all data
    docker-compose build --no-cache app
    docker-compose -f docker-compose.yml -f docker-compose.development.yml up -d

License

Apache 2.0 - See LICENSE file for details

About

Memory-augmented multi-agent system for scientific literature curation and analysis.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages