Orchestrator Framework

Overview

Orchestrator is a powerful, flexible AI pipeline orchestration framework that simplifies the creation and execution of complex AI workflows. By combining YAML-based configuration with intelligent model selection and automatic ambiguity resolution, Orchestrator makes it easy to build sophisticated AI applications without getting bogged down in implementation details.

Key Features

🎯 YAML-Based Pipelines: Define complex workflows in simple, readable YAML with full template variable support
🤖 Multi-Model Support: Seamlessly work with OpenAI, Anthropic, Google, Ollama, and HuggingFace models
🧠 Intelligent Model Selection: Automatically choose the best model based on task requirements
🔄 Automatic Ambiguity Resolution: Use <AUTO> tags to let AI resolve configuration ambiguities
📦 Modular Architecture: Extend with custom models, tools, and control systems
🛡️ Production Ready: Built-in error handling, retries, checkpointing, and monitoring
⚡ Parallel Execution: Efficient resource management and parallel task execution
🐳 Sandboxed Execution: Secure code execution in isolated environments
💾 Lazy Model Loading: Models are downloaded only when needed, saving disk space
🔧 Reliable Tool Execution: Guaranteed execution of file operations with LangChain structured outputs
📝 Advanced Templates: Support for nested variables, filters, and Jinja2-style templates

Quick Start

Installation

pip install py-orc

For additional features:

pip install py-orc[ollama]      # Ollama model support
pip install py-orc[cloud]        # Cloud model providers
pip install py-orc[dev]          # Development tools
pip install py-orc[all]          # Everything

API Key Configuration

Orchestrator supports multiple AI providers. Configure your API keys using the interactive setup:

# Interactive setup for all providers
orchestrator keys setup

# Or add individual keys
orchestrator keys add openai
orchestrator keys add anthropic
orchestrator keys add google
orchestrator keys add huggingface

# Check configured providers
orchestrator keys list

# Validate your configuration
orchestrator keys validate

API keys are stored securely in ~/.orchestrator/.env with file permissions set to 600 (owner read/write only).

Required Environment Variables

If you prefer to set environment variables manually:

OPENAI_API_KEY - OpenAI API key (for GPT models)
ANTHROPIC_API_KEY - Anthropic API key (for Claude models)
GOOGLE_AI_API_KEY - Google AI API key (for Gemini models)
HF_TOKEN - Hugging Face token (for HuggingFace models)

Note: Ollama models run locally and don't require API keys. They will be downloaded automatically on first use.

Basic Usage

Create a simple pipeline (hello_world.yaml):

id: hello_world
name: Hello World Pipeline
description: A simple example pipeline

steps:
  - id: greet
    action: generate_text
    parameters:
      prompt: "Say hello to the world in a creative way!"
      
  - id: translate
    action: generate_text
    parameters:
      prompt: "Translate this greeting to Spanish: {{ greet.result }}"
    dependencies: [greet]

outputs:
  greeting: "{{ greet.result }}"
  spanish: "{{ translate.result }}"

Run the pipeline:

# Using the CLI script
python scripts/run_pipeline.py hello_world.yaml

# With inputs
python scripts/run_pipeline.py hello_world.yaml -i name=World -i language=Spanish

# From a JSON file
python scripts/run_pipeline.py hello_world.yaml -f inputs.json -o output_dir/

# Or programmatically
import orchestrator as orc

# Initialize models (auto-detects available models)
orc.init_models()

# Compile and run the pipeline
pipeline = orc.compile("hello_world.yaml")
result = pipeline.run()

print(result)

Using AUTO Tags

Orchestrator's <AUTO> tags let AI decide configuration details:

steps:
  - id: analyze_data
    action: analyze
    parameters:
      data: "{{ input_data }}"
      method: <AUTO>Choose the best analysis method for this data type</AUTO>
      visualization: <AUTO>Decide if we should create a chart</AUTO>

Model Configuration

Configure available models in models.yaml:

models:
  # Local models (via Ollama) - downloaded on first use
  - source: ollama
    name: llama3.1:8b
    expertise: [general, reasoning, multilingual]
    size: 8b
    
  - source: ollama
    name: qwen2.5-coder:7b
    expertise: [code, programming]
    size: 7b

  # Cloud models
  - source: openai
    name: gpt-4o
    expertise: [general, reasoning, code, analysis, vision]
    size: 1760b  # Estimated

defaults:
  expertise_preferences:
    code: qwen2.5-coder:7b
    reasoning: deepseek-r1:8b
    fast: llama3.2:1b

Models are downloaded only when first used, saving disk space and initialization time.

Advanced Example

Here's a more complex example showing model requirements and parallel execution:

id: research_pipeline
name: AI Research Pipeline
description: Research a topic and create a comprehensive report

inputs:
  - name: topic
    type: string
    description: Research topic
    
  - name: depth
    type: string
    default: <AUTO>Determine appropriate research depth</AUTO>

steps:
  # Parallel research from multiple sources
  - id: web_search
    action: search_web
    parameters:
      query: "{{ topic }} latest research 2025"
      count: <AUTO>Decide how many results to fetch</AUTO>
    requires_model:
      expertise: [research, web]
      
  - id: academic_search
    action: search_academic
    parameters:
      query: "{{ topic }}"
      filters: <AUTO>Set appropriate academic filters</AUTO>
    requires_model:
      expertise: [research, academic]
      
  # Analyze findings with specialized model
  - id: analyze_findings
    action: analyze
    parameters:
      web_results: "{{ web_search.results }}"
      academic_results: "{{ academic_search.results }}"
      analysis_focus: <AUTO>Determine key aspects to analyze</AUTO>
    dependencies: [web_search, academic_search]
    requires_model:
      expertise: [analysis, reasoning]
      min_size: 20b  # Require large model for complex analysis
      
  # Generate report
  - id: write_report
    action: generate_document
    parameters:
      topic: "{{ topic }}"
      analysis: "{{ analyze_findings.result }}"
      style: <AUTO>Choose appropriate writing style</AUTO>
      length: <AUTO>Determine optimal report length</AUTO>
    dependencies: [analyze_findings]
    requires_model:
      expertise: [writing, general]

outputs:
  report: "{{ write_report.document }}"
  summary: "{{ analyze_findings.summary }}"

Complete Example: Research Report Generator

Here's a fully functional pipeline that generates research reports:

# research_report.yaml
id: research_report
name: Research Report Generator
description: Generate comprehensive research reports with citations

inputs:
  - name: topic
    type: string
    description: Research topic
  - name: instructions
    type: string
    description: Additional instructions for the report

outputs:
  - pdf: <AUTO>Generate appropriate filename for the research report PDF</AUTO>

steps:
  - id: search
    name: Web Search
    action: search_web
    parameters:
      query: <AUTO>Create effective search query for {topic} with {instructions}</AUTO>
      max_results: 10
    requires_model:
      expertise: fast
      
  - id: compile_notes
    name: Compile Research Notes
    action: generate_text
    parameters:
      prompt: |
        Compile comprehensive research notes from these search results:
        {{ search.results }}
        
        Topic: {{ topic }}
        Instructions: {{ instructions }}
        
        Create detailed notes with:
        - Key findings
        - Important quotes
        - Source citations
        - Relevant statistics
    dependencies: [search]
    requires_model:
      expertise: [analysis, reasoning]
      min_size: 7b
      
  - id: write_report
    name: Write Report
    action: generate_document
    parameters:
      content: |
        Write a comprehensive research report on "{{ topic }}"
        
        Research notes:
        {{ compile_notes.result }}
        
        Requirements:
        - Professional academic style
        - Include introduction, body sections, and conclusion
        - Cite sources properly
        - {{ instructions }}
      format: markdown
    dependencies: [compile_notes]
    requires_model:
      expertise: [writing, general]
      min_size: 20b
      
  - id: create_pdf
    name: Create PDF
    action: convert_to_pdf
    parameters:
      markdown: "{{ write_report.document }}"
      filename: "{{ outputs.pdf }}"
    dependencies: [write_report]

Run it with:

import orchestrator as orc

# Initialize models
orc.init_models()

# Compile pipeline
pipeline = orc.compile("research_report.yaml")

# Run with inputs
result = pipeline.run(
    topic="quantum computing applications in medicine",
    instructions="Focus on recent breakthroughs and future potential"
)

print(f"Report saved to: {result}")

Examples

The examples/ directory contains working demonstrations of Orchestrator's capabilities. Here's a highlighted example:

Simple Data Processing Pipeline

The simple_data_processing.yaml pipeline demonstrates fundamental concepts:

What it does:

Reads a CSV file containing project data
Filters records based on criteria (status = "active")
Generates both filtered data and an analysis report

Key concepts demonstrated:

File I/O with the filesystem tool
Data processing with the data-processing tool
Template variable usage between pipeline steps
Multi-format output generation

Example outputs:

Filtered CSV data - Active projects only
Analysis report - Human-readable summary with data preview

Browse more examples in the examples directory including web research, model routing, recursive processing, and more.

Documentation

Comprehensive documentation is available at orc.readthedocs.io, including:

Available Models

Orchestrator supports a wide range of models:

Local Models (via Ollama)

Gemma3 27B: Google's powerful general-purpose model
Llama 3.x: General purpose, multilingual support
DeepSeek-R1: Advanced reasoning and coding
Qwen2.5-Coder: Specialized for code generation
Mistral: Fast and efficient general purpose

Cloud Models

OpenAI: GPT-4.1 (latest)
Anthropic: Claude Sonnet 4 (claude-sonnet-4-20250514)
Google: Gemini 2.5 Flash (gemini-2.5-flash)

HuggingFace Models

Mistral 7B Instruct v0.3: High-quality instruction-following model
Llama, Qwen, Phi, and many more
Automatically downloaded on first use

Requirements

Python 3.8+
Optional: Ollama for local model execution
Optional: API keys for cloud providers (OpenAI, Anthropic, Google)

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Support

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Orchestrator in your research, please cite:

@software{orchestrator2025,
  title = {Orchestrator: AI Pipeline Orchestration Framework},
  author = {Manning, Jeremy R. and {Contextual Dynamics Lab}},
  year = {2025},
  url = {https://github.com/ContextLab/orchestrator},
  organization = {Dartmouth College}
}

Repository Organization

orchestrator/
├── config/                 # Configuration files
│   ├── models.yaml        # Model definitions and configurations
│   ├── orchestrator.yaml  # Main orchestrator settings
│   └── validation_schema.json  # Schema for YAML validation
├── data/                  # Sample data files
├── docs/                  # Documentation
│   ├── tutorials/         # Step-by-step guides
│   ├── api/              # API reference
│   └── user_guide/       # User documentation
├── examples/              # Example pipelines
│   ├── *.yaml            # All example pipeline YAML files
│   ├── data/             # Example data files
│   ├── outputs/          # Generated outputs (gitignored)
│   └── checkpoints/      # Pipeline checkpoints (gitignored)
├── scripts/               # Utility scripts
│   ├── run_pipeline.py   # Main pipeline runner
│   ├── setup_api_keys.py # API key configuration
│   └── install_web_deps.sh # Install web dependencies
├── src/orchestrator/      # Source code
│   ├── core/             # Core components (Pipeline, Task, etc.)
│   ├── models/           # Model integrations
│   ├── tools/            # Tool implementations
│   ├── compiler/         # YAML compiler and template engine
│   └── control_systems/  # Execution control systems
├── tests/                 # Test suite
│   ├── integration/      # Integration tests
│   ├── local/           # Tests requiring local resources
│   └── test_*.py        # Unit tests
└── venv/                 # Virtual environment (gitignored)

Acknowledgments

Orchestrator is developed and maintained by the Contextual Dynamics Lab at Dartmouth College.

Built with ❤️ by the Contextual Dynamics Lab

Name		Name	Last commit message	Last commit date
Latest commit History 495 Commits
.checkpoints/test_while_trace		.checkpoints/test_while_trace
.claude		.claude
.github/workflows		.github/workflows
config		config
data		data
docs		docs
examples		examples
notes		notes
prompts		prompts
samples		samples
scripts		scripts
src/orchestrator		src/orchestrator
test_output_dir		test_output_dir
tests		tests
{{ output_path }}		{{ output_path }}
{{parameters.output_path}}		{{parameters.output_path}}
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
data_processing_report.html		data_processing_report.html
data_processing_report.md		data_processing_report.md
mcp_tools_config.json		mcp_tools_config.json
models.yaml		models.yaml
processed_data.csv		processed_data.csv
processed_data.parquet		processed_data.parquet
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
raw_data.csv		raw_data.csv
regenerate_all_outputs.py		regenerate_all_outputs.py
regenerate_remaining.py		regenerate_remaining.py
regenerate_x_files.py		regenerate_x_files.py
test_analyze_auto.yaml		test_analyze_auto.yaml
test_auto_tag.yaml		test_auto_tag.yaml
test_climate_document.md		test_climate_document.md
test_control_flow_simple.yaml		test_control_flow_simple.yaml
test_filesystem_templates.yaml		test_filesystem_templates.yaml
test_foreach_filesystem.yaml		test_foreach_filesystem.yaml
test_full_pipeline_with_runtime.py		test_full_pipeline_with_runtime.py
test_llm_tools.py		test_llm_tools.py
test_minimal_pipeline.yaml		test_minimal_pipeline.yaml
test_report_save.yaml		test_report_save.yaml
test_runtime_resolution.py		test_runtime_resolution.py
test_simple_iteration.py		test_simple_iteration.py
test_simple_llm.yaml		test_simple_llm.yaml
test_simple_loop.yaml		test_simple_loop.yaml
test_template_fix.yaml		test_template_fix.yaml
test_while_loop_iteration.py		test_while_loop_iteration.py
verify_all_outputs.py		verify_all_outputs.py
verify_md_outputs.py		verify_md_outputs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Orchestrator Framework

Overview

Key Features

Quick Start

Installation

API Key Configuration

Required Environment Variables

Basic Usage

Using AUTO Tags

Model Configuration

Advanced Example

Complete Example: Research Report Generator

Examples

Simple Data Processing Pipeline

Documentation

Available Models

Local Models (via Ollama)

Cloud Models

HuggingFace Models

Requirements

Contributing

Support

License

Citation

Repository Organization

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ContextLab/orchestrator

Folders and files

Latest commit

History

Repository files navigation

Orchestrator Framework

Overview

Key Features

Quick Start

Installation

API Key Configuration

Required Environment Variables

Basic Usage

Using AUTO Tags

Model Configuration

Advanced Example

Complete Example: Research Report Generator

Examples

Simple Data Processing Pipeline

Documentation

Available Models

Local Models (via Ollama)

Cloud Models

HuggingFace Models

Requirements

Contributing

Support

License

Citation

Repository Organization

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages