BioCurator

Memory-augmented multi-agent system for scientific literature curation and analysis.

Overview

BioCurator demonstrates how AI agents can develop domain expertise through collaborative literature analysis, using a sophisticated multi-modal memory architecture and safety-first development approach.

Quick Start

Development Mode (Local Models - Zero Cost)

# Set up development environment with UV
export UV_LINK_MODE=copy
./scripts/setup_venv.sh
source .venv/bin/activate
export APP_MODE=development

# Configure environment (optional - uses defaults if not set)
cp .env.example .env
# Edit .env to set JUPYTER_TOKEN and other configurations

# Run with local models (Ollama) - services start in dependency order
docker-compose -f docker-compose.yml -f docker-compose.development.yml up -d

# Wait for all services to be healthy (takes ~30-60 seconds)
docker-compose ps  # Check status

# Access services (replace localhost with your server IP if remote):
# - BioCurator API: http://localhost:8080/
# - Health Status: http://localhost:8080/health/
# - Neo4j Browser: http://localhost:7474/ (user: neo4j, password: dev_password)
# - Jupyter Lab: http://localhost:8888/ (token: biocurator-dev or JUPYTER_TOKEN)
# - Ollama API: http://localhost:11434/

# Verify system health
curl -s http://localhost:8080/health/ | python -m json.tool

Production Mode (Cloud Models)

# Set up production environment with UV
./scripts/setup_venv.sh
source .venv/bin/activate
export APP_MODE=production

# Run with cloud models
docker-compose -f docker-compose.yml -f docker-compose.production.yml up

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Agent Orchestra                      │
├─────────────────────────────────────────────────────────┤
│  Research    Literature    Deep      Domain    Knowledge │
│  Director      Scout      Reader   Specialist   Weaver  │
└────────────┬────────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────────┐
│                   Safety Controls                        │
│  Circuit Breakers │ Rate Limiting │ Cost Tracking       │
└────────────┬────────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────────┐
│                  Memory Systems                          │
│   Neo4j   │   Qdrant   │  PostgreSQL  │  Redis │ SQLite │
└──────────────────────────────────────────────────────────┘

Key Features

Multi-Agent Coordination: Specialized agents for literature discovery, analysis, and synthesis
- Research Director for workflow orchestration
- Literature Scout, Deep Reader, Domain Specialist, Knowledge Weaver (future PRs)
- Async message passing with request/response patterns
- Persistent task queue with dependency management and retry logic
Multi-Modal Memory: Knowledge graph, vector embeddings, episodic memory, and procedural patterns
- Neo4j knowledge graph with concept relationships
- Qdrant vector store for semantic search
- PostgreSQL episodic memory for interaction histories
- Redis working memory for active contexts
- InfluxDB time-series metrics (optional)
Safety-First Design: Circuit breakers, rate limiting, cost tracking, and anomaly detection
- Per-agent circuit breakers with configurable thresholds
- Rate limiting with token bucket algorithm
- Real-time cost tracking and budget enforcement
- Behavior monitoring with anomaly detection
- Comprehensive safety event logging
Development Mode: Free local model operation with Ollama (DeepSeek-R1, Llama 3.1, Qwen 2.5)
- Zero cost budget enforcement
- Hard guard against cloud model access
- Local model optimization with quality bridging
Production Ready: Cloud model integration with comprehensive monitoring and observability
- Claude Sonnet 4 and GPT-4o model support
- Prometheus metrics integration
- Health monitoring with agent status reporting
- Auto-scaling and load balancing capabilities

Development

Requirements

Python 3.11+
UV package manager (installed automatically by setup script)
Docker and Docker Compose

Setup

# Automated setup with UV
./scripts/setup_venv.sh
source .venv/bin/activate

# Manual setup alternative
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev]"

Common Commands

# Run tests
make test

# Run linting
make lint

# Format code
make format

# Build containers
make build

# View metrics
curl http://localhost:9090/metrics

# Check health (includes agent status)
curl http://localhost:8080/health

# Run agent workflow examples
python examples/basic_workflow.py     # Basic multi-agent workflow
python examples/safety_demo.py       # Safety controls demonstration

# Agent system health
curl http://localhost:8080/health | jq '.components[] | select(.name | startswith("agent"))'

Documentation

Testing

The project maintains:

=70% overall test coverage
=85% coverage for safety-critical modules
Comprehensive integration tests
Performance benchmarks

Troubleshooting

Common Issues

Services fail to start or restart continuously
- Check Docker logs: docker logs <container-name>
- Neo4j memory settings require specific format in Docker Compose
- Ensure all required ports are available: 8080, 7474, 7687, 6333, 5432, 6379, 8086
Application can't connect to databases
- Verify environment variables are set in docker-compose files
- Services must use container names (e.g., redis, postgres) not localhost
- Check that all services are healthy: docker-compose ps
Health endpoint shows "unhealthy" but system works
- This is expected if optional backends (like InfluxDB) aren't initialized
- Check individual component status in the health response
- Only required backends (Redis, PostgreSQL, Neo4j, Qdrant) need to be healthy
Cannot access endpoints from browser (EC2/Remote)
- Ensure security groups allow inbound traffic on required ports
- Use server's public IP instead of localhost
- Consider SSH tunneling for secure development access

Fresh start after issues

docker-compose down
docker volume rm $(docker volume ls -q | grep biocurator)  # Removes all data
docker-compose build --no-cache app
docker-compose -f docker-compose.yml -f docker-compose.development.yml up -d

License

Apache 2.0 - See LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
.ideas		.ideas
configs		configs
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
cagents.development.yaml		cagents.development.yaml
cagents.production.yaml		cagents.production.yaml
cagents.yaml		cagents.yaml
docker-compose.development.yml		docker-compose.development.yml
docker-compose.memory.yml		docker-compose.memory.yml
docker-compose.production.yml		docker-compose.production.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BioCurator

Overview

Quick Start

Development Mode (Local Models - Zero Cost)

Production Mode (Cloud Models)

Architecture

Key Features

Development

Requirements

Setup

Common Commands

Documentation

Testing

Troubleshooting

Common Issues

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sunitj/bioCurator

Folders and files

Latest commit

History

Repository files navigation

BioCurator

Overview

Quick Start

Development Mode (Local Models - Zero Cost)

Production Mode (Cloud Models)

Architecture

Key Features

Development

Requirements

Setup

Common Commands

Documentation

Testing

Troubleshooting

Common Issues

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages