Releases: bacalhau-project/bacalhau
v1.6.0
Bacalhau v1.6.0 Release Notes
We are excited to announce the release of Bacalhau v1.6.0, introducing a new communication architecture that significantly improves the reliability and resilience of distributed compute networks.
Key Features and Improvements
New Bacalhau Messaging Protocol (BMP)
At the heart of this release is the new messaging protocol, a complete redesign of node communication that brings significant improvements to network reliability:
Key Benefits
- Self-Healing Network: Compute nodes and orchestrators automatically reconnect and sync after network interruptions
- Offline-First Operation: Compute nodes can start and operate even when disconnected from the orchestrator
- Automatic State Recovery: When nodes reconnect, they automatically share all missed job execution information and results
- Zero Data Loss: Ensures no job execution data or results are lost during network disruptions
- Seamless Recovery: Network interruptions are handled transparently without requiring manual intervention
Technical Improvements
- Reliable Message Delivery: Ordered, at-least-once message delivery between nodes
- Automatic Recovery: Built-in failure detection and recovery mechanisms
- Connection Health Monitoring: Proactive health checks and connection management
- Event-Based Architecture: Decoupled event processing from message delivery
- Efficient Checkpointing: Maintains system state for reliable recovery
- Backward Compatibility: Maintains compatibility with v1.5 orchestrators
Enhanced Web UI Experience
- Direct Result Downloads: Download job results directly from the interface
- Simplified Configuration: Automatic request routing eliminates manual IP configuration
- Improved Architecture: Streamlined backend setup while maintaining security
Operational Improvements
- Reverse Proxy Support: Added capability to run orchestrator behind a reverse proxy
- Agent Configuration: New
bacalhau agent config
command to inspect agent configuration - TLS Support: Added TLS encryption support for NATS communication
- Better Logging: Implemented more human-readable logging patterns
Upgrade Notes and Backward Compatibility
Bacalhau v1.6.0 maintains backward compatibility while introducing the new BMP:
- Compute nodes maintain compatibility with v1.5 orchestrators, and vice versa
- Support for re-handshake from legacy clients
We're excited for you to experience the enhanced reliability and resilience provided by the BMP in Bacalhau v1.6.0. This release represents a significant architectural advancement in making distributed computing more robust and dependable.
v1.6.0-rc4
What's Changed
- Disable NCL tests as it is the default protocol by @wdbaruni in #4741
- compute node backward compatibility with v1.5 orchestrators by @wdbaruni in #4761
- improve orchestrator node self-registeration by @wdbaruni in #4762
- more human readable logging pattern by @wdbaruni in #4763
- fix lint failing due to sqlite3 missing by @wdbaruni in #4767
- better exchange of starting seqNum during handshakes by @wdbaruni in #4766
- Add Shutdown Notice from Compute Nodes by @wdbaruni in #4769
- Fix race conditions resulting in multiple data planes by @wdbaruni in #4771
- Faster reconnect on handshake required response by @wdbaruni in #4772
- Handle watchers with future sequence numbers gracefully by @wdbaruni in #4773
- Propagate bacerrors over nclprotocol by @wdbaruni in #4774
Full Changelog: v1.6.0-rc3...v1.6.0-rc4
v1.6.0-rc3
What's Changed
Full Changelog: v1.6.0-rc2...v1.6.0-rc3
v1.6.0-rc2
What's Changed
Full Changelog: v1.6.0-rc1...v1.6.0-rc2
v1.6.0-rc1
What's Changed
- Release.1.5 by @wdbaruni in #4594
- Bump next from 14.2.7 to 14.2.10 in /webui by @dependabot in #4453
- Improve API error handling by @jamlo in #4607
- Support running Bacalhau in Docker compose by @jamlo in #4596
- fix cspell and disable golangci-lint spellcheck by @wdbaruni in #4610
- remove event emitter by @wdbaruni in #4601
- remove pluggable executors by @wdbaruni in #4603
- remove exec command and job translation by @wdbaruni in #4604
- organize utils and remove unused ones by @wdbaruni in #4602
- validate swagger.json by @wdbaruni in #4619
- build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace from 1.28.0 to 1.31.0 by @dependabot in #4617
- Support TestContainers PoC by @jamlo in #4627
- Migrate BashTub Tests to TestContainers Based Tests by @jamlo in #4631
- fix: default publishers not working by @wdbaruni in #4646
- fix test flakiness by useing busybox instead of ubuntu images by @wdbaruni in #4655
- fix: stop jobs with short ids by @wdbaruni in #4657
- Improve startup time by optimizing IMDS access by @wdbaruni in #4649
- improve routing of webui requests by @wdbaruni in #4645
- Fix Go lint issues by @jamlo in #4647
- Prevent False Positives in CI by @jamlo in #4642
- Remove Depandabot Auto Merges and CircleCI References by @jamlo in #4660
- build(deps): bump actions/checkout from 2 to 4 by @dependabot in #4661
- build(deps): bump peter-evans/repository-dispatch from 1 to 3 by @dependabot in #4662
- build(deps): bump github.com/swaggo/swag from 1.16.2 to 1.16.4 by @dependabot in #4632
- release 1.5.1 by @wdbaruni in #4668
- build(deps): bump werkzeug from 2.2.3 to 3.0.6 in /integration/airflow by @dependabot in #4666
- build(deps): bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.16.5 to 1.17.34 by @dependabot in #4667
- Improve Github Actions Workflow and GoLint by @jamlo in #4672
- integrate lib/watcher with job store by @wdbaruni in #4676
- refactor execution store to use models.Execution by @wdbaruni in #4677
- fix logstream test flakiness by @wdbaruni in #4678
- speed up tests by using tagged images instead of latest by @wdbaruni in #4679
- move messaging models from pkg/compute to pkg/models/messages by @wdbaruni in #4680
- build(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.28.0 to 1.31.0 by @dependabot in #4673
- feat: adds
agent config
command that returns the agent configuration by @frrist in #4671 - async execution handling by @wdbaruni in #4683
- add compute execution-flow doc and remove outdated docs by @wdbaruni in #4685
- Support NATS TLS Communication by @jamlo in #4686
- skip flaky logstream tests by @wdbaruni in #4693
- prevent port number reuse with TTL-based caching by @wdbaruni in #4689
- ncl based async communication between nodes by @wdbaruni in #4687
- decouple envelope from ncl by @wdbaruni in #4695
- fix swagger generation by @wdbaruni in #4699
- Add missing execution state types to definitions by @markkuhn in #4698
- Display full command for next results in
job list
pagination by @markkuhn in #4691 - Add job result download option to WebUI by @markkuhn in #4702
- build(deps): bump apache-airflow from 2.9.3 to 2.10.3 in /integration/airflow by @dependabot in #4701
- login docker hub access in buildkite by @wdbaruni in #4707
- use new update checker endpoint by @wdbaruni in #4706
- build(deps): bump aiohttp from 3.9.4 to 3.10.11 in /integration/airflow by @dependabot in #4708
- build(deps): bump cross-spawn from 7.0.3 to 7.0.6 in /webui by @dependabot in #4709
- build(deps): bump github.com/nats-io/nats-server/v2 from 2.10.20 to 2.10.22 by @dependabot in #4710
- fix nats server panic on shutdown by @wdbaruni in #4712
- improve watcher creation pattern and manager naming by @wdbaruni in #4713
- expose backoff.BackoffDuration method by @wdbaruni in #4714
- build(deps): bump github.com/samber/lo from 1.39.0 to 1.47.0 by @dependabot in #4711
- build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.28.0 to 1.32.0 by @dependabot in #4715
- ncl: introduce ordered publisher by @wdbaruni in #4717
- build(deps): bump go.uber.org/mock from 0.4.0 to 0.5.0 by @dependabot in #4718
- Add Transport Package with Event Publishing Components by @wdbaruni in #4721
- build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp from 1.28.0 to 1.32.0 by @dependabot in #4722
- Support Running Orchestrator behind a Reverse Proxy by @jamlo in #4724
- Switch to ubuntu 22.04 from ChainGaurd docker image by @jamlo in #4725
- new node manager decoupled from transport layer by @wdbaruni in #4728
- authenticate earthly actions by @wdbaruni in #4729
- enable ephemeral watchers by @wdbaruni in #4730
- Introduce Request/Response Messaging in NCL by @wdbaruni in #4731
- extract bprotocol transport layer from pkg/node by @wdbaruni in #4732
- Add persisted sequence tracking and checkpoint support by @wdbaruni in #4733
- Allow to pass URL as the API Host for CLI by @jamlo in #4726
- NCL Protocol by @wdbaruni in #4734
- Support re-handshake from legacy clients by @wdbaruni in #4739
- backward compatible metadata keys by @wdbaruni in #4738
New Contributors
Full Changelog: v1.5.0...v1.6.0-rc1
v1.5.2
What's Changed
- build(deps): bump werkzeug from 2.2.3 to 3.0.6 in /integration/airflow by @dependabot in #4666
- build(deps): bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.16.5 to 1.17.34 by @dependabot in #4667
- Improve Github Actions Workflow and GoLint by @jamlo in #4672
- integrate lib/watcher with job store by @wdbaruni in #4676
- refactor execution store to use models.Execution by @wdbaruni in #4677
- fix logstream test flakiness by @wdbaruni in #4678
- speed up tests by using tagged images instead of latest by @wdbaruni in #4679
- move messaging models from pkg/compute to pkg/models/messages by @wdbaruni in #4680
- build(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.28.0 to 1.31.0 by @dependabot in #4673
- feat: adds
agent config
command that returns the agent configuration by @frrist in #4671 - async execution handling by @wdbaruni in #4683
- add compute execution-flow doc and remove outdated docs by @wdbaruni in #4685
- Support NATS TLS Communication by @jamlo in #4686
- skip flaky logstream tests by @wdbaruni in #4693
- prevent port number reuse with TTL-based caching by @wdbaruni in #4689
- ncl based async communication between nodes by @wdbaruni in #4687
- decouple envelope from ncl by @wdbaruni in #4695
- fix swagger generation by @wdbaruni in #4699
- Add missing execution state types to definitions by @markkuhn in #4698
- Display full command for next results in
job list
pagination by @markkuhn in #4691 - Add job result download option to WebUI by @markkuhn in #4702
- build(deps): bump apache-airflow from 2.9.3 to 2.10.3 in /integration/airflow by @dependabot in #4701
- login docker hub access in buildkite by @wdbaruni in #4707
- use new update checker endpoint by @wdbaruni in #4706
- build(deps): bump aiohttp from 3.9.4 to 3.10.11 in /integration/airflow by @dependabot in #4708
- build(deps): bump cross-spawn from 7.0.3 to 7.0.6 in /webui by @dependabot in #4709
- build(deps): bump github.com/nats-io/nats-server/v2 from 2.10.20 to 2.10.22 by @dependabot in #4710
- fix nats server panic on shutdown by @wdbaruni in #4712
- improve watcher creation pattern and manager naming by @wdbaruni in #4713
- expose backoff.BackoffDuration method by @wdbaruni in #4714
- build(deps): bump github.com/samber/lo from 1.39.0 to 1.47.0 by @dependabot in #4711
- build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.28.0 to 1.32.0 by @dependabot in #4715
- ncl: introduce ordered publisher by @wdbaruni in #4717
- build(deps): bump go.uber.org/mock from 0.4.0 to 0.5.0 by @dependabot in #4718
- Add Transport Package with Event Publishing Components by @wdbaruni in #4721
- build(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp from 1.28.0 to 1.32.0 by @dependabot in #4722
- Support Running Orchestrator behind a Reverse Proxy by @jamlo in #4724
New Contributors
Full Changelog: v1.5.1...v1.5.2
v1.5.1
Major Improvements
- Enhanced Web UI Routing: Improved routing of Web UI requests without requiring backend address definition
- Faster Startup: Dramatically reduced node startup time from ~9 seconds to ~1.5 seconds by optimizing IMDS access
- Job Management: Added support for stopping jobs using short IDs
- Bug Fix: Resolved issues with default publishers functionality
Breaking Changes
- Removed exec command and job translation functionality
Additional Changes
- Added Docker compose support
- Improved API error handling
Links
- Full Changelog: v1.5.0...v1.5.1
v1.5.1-rc1
Major Improvements
- Enhanced Web UI Routing: Improved routing of Web UI requests without requiring backend address definition
- Faster Startup: Dramatically reduced node startup time from ~9 seconds to ~1.5 seconds by optimizing IMDS access
- Job Management: Added support for stopping jobs using short IDs
- Bug Fix: Resolved issues with default publishers functionality
Breaking Changes
- Removed exec command and job translation functionality
Additional Changes
- Added Docker compose support
- Improved API error handling
Links
- Full Changelog: v1.5.0...v1.5.1-rc1
v1.5.0
Bacalhau v1.5 Release Notes
We're thrilled to announce the release of Bacalhau 1.5.0, a significant update that introduces powerful new features and enhancements. Building on the momentum from our previous releases, Bacalhau 1.5 focuses on simplifying configuration, improving visibility, and enhancing overall performance.
Key Features and Improvements
Simplified Configuration Management
- New File-Based Configuration System: We've introduced a more intuitive file-based configuration system, replacing complex CLI flags. This change makes setting up and managing Bacalhau networks more straightforward and less error-prone.
- Flexible Configuration Options: Users can now provide:
- A single config file
- Multiple config files that are merged
- Key-value pairs directly via the
-c
- flag (e.g.,-c key=value
)
- Decoupled Configuration: Configuration is now decoupled from the repo (now called data dir), allowing for more flexible setups.
Enhanced Data Directory Structure
- Improved Organization: We've clearly separated compute and orchestrator related data, providing a cleaner structure.
- Consolidated Metadata: System metadata is now consolidated into a single
system_metadata.json
file for easier management.
New WebUI
- Embedded Management Interface: Introduced a comprehensive WebUI for easier management and monitoring of your Bacalhau network. This significant feature allows users to visualize and interact with their Bacalhau deployment without relying solely on the CLI.
Enhanced Job Visibility and Reporting
- Granular Event Reporting: Improved reporting on job progress, including detailed scheduling actions, failures, and retries.
- Better Error Messages: Enhanced error reporting system with meaningful messages and debugging hints.
API Enhancements
- Pagination for Job History: Implemented pagination support for job history, improving the user experience when dealing with a large jobs and making it easier to navigate through job and execution history events.
Upgrade Notes and Backward Compatibility
While Bacalhau 1.5.0 introduces some breaking changes, we've ensured a smooth upgrade path:
- Most CLI flags have been removed in favor of configuration files, but we gracefully handle deprecated flags for backward compatibility.
- The structure of the data directory has changed, but we automatically handle the migration when you first run the new Bacalhau version.
- Many old configuration options have been deprecated in favor of the new structure and config keys.
Please refer to our [updated documentation](https://docs.bacalhau.org/) for detailed instructions on upgrading to Bacalhau 1.5.0 and taking advantage of the new configuration system.
We're excited for you to explore the new features and enhancements in Bacalhau 1.5.0. Whether you're a seasoned Bacalhau user or just getting started, this update will empower you to build and run distributed compute networks more effectively than ever before.
v1.4.0
Announcing Bacalhau 1.4.0
We’re excited to announce the release of Bacalhau 1.4.0, a significant update that introduces powerful new features and enhancements. Building on the momentum from our previous releases this year (1.2.0, 1.3.0, 1.3.1, and 1.3.2), Bacalhau 1.4 strengthens our platform’s performance, scalability, and user experience, solidifying its position as a leading platform for building and running distributed compute networks.
In this release, we focused on three major efforts, with particular emphasis on those deploying Bacalhau at scale:
Performance and Scalability Enhancements
-
Extended Job Queuing: Bacalhau 1.4.0 introduces a more robust queuing system, improving job scheduling and execution efficiency, especially in high-demand or globally distributed networks. By intelligently managing job queues, Bacalhau ensures smoother operations and increased throughput, leading to higher success rates for your distributed compute tasks.
-
Migration to NATS, Deprecation of libp2p and Embedded IPFS Node: We’ve fully transitioned to NATS.io as Bacalhau’s communication backbone, moving away from libp2p and the embedded IPFS node. This change streamlines communication and reduces overhead, marking a significant step towards a more efficient and scalable network. IPFS integration remains available with external nodes for those who need it.
Improved User Experience
-
Updated CLI and HTTP API: Bacalhau 1.4.0 introduces a revamped command-line interface (CLI) and HTTP API. These updates align the CLI commands with the new API structure and enhance overall usability. While most changes are seamless for existing users, some command adjustments have been made (e.g., bacalhau create becomes bacalhau job run). Our updated documentation will guide you through the transition smoothly.
-
Job Spec Updates: We've introduced an updated Job Specification format while deprecating some features of the previous format. This change requires users to update their job specs but brings improved clarity and consistency.
-
Enhanced Error Reporting: Bacalhau 1.4.0 improves error reporting, making it easier to diagnose and troubleshoot issues. This enhancement contributes to a more stable and reliable experience, helping users quickly resolve any problems that arise. For detailed guidance, please consult our documentation on the new Job Spec requirements.
-
Introduction of Node Manager: In Bacalhau 1.4.0, we’re introducing the Node Manager. This feature simplifies node operations, providing a clear view of all compute nodes and their status. You can approve, deny, or delete nodes as needed, making management straightforward. Heartbeats from nodes keep the Node Manager updated on their connectivity, enhancing overall stability and performance.
Smooth Transition for Existing Users
- Error Handling and Guidance: We understand that transitioning to a new version can be challenging. To ease this process, we’ve implemented helpful error messages and guidance for those adjusting to the changes in CLI behavior and job specifications. We’ve also created a table to show how some of the Bacalhau API endpoints have been remapped. If you’re not ready to upgrade, you can continue using version 1.3.1 while maintaining your private Bacalhau cluster.
Join Us on the Journey
We’re excited for you to explore the new features and enhancements in Bacalhau 1.4.0. Over the next five days, we’ll dive deeper into each topic in our “5 Days of Bacalhau” blog series. Whether you’re a seasoned Bacalhau user or just getting started, this update will empower you to build and run distributed compute networks more effectively than ever before.