Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(engine): improve stability of workers with terminations and disconnects #229

Merged
merged 6 commits into from
Mar 4, 2024

Conversation

abelanger5
Copy link
Contributor

@abelanger5 abelanger5 commented Mar 3, 2024

Description

  • Adds handling of worker connection edge cases where our worker heartbeat goroutine could have leaked.
  • Adds a new feature which reassigns a step run to a different worker if the worker has not heartbeat for 60 seconds, which will almost always mean the worker is dead.
  • Note that this takes into account retries, because we view this as a failure from an execution perspective.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Copy link

vercel bot commented Mar 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
hatchet ⬜️ Ignored (Inspect) Visit Preview Mar 4, 2024 3:59am

abelanger5 and others added 2 commits March 3, 2024 22:57
Co-authored-by: Gabe Ruttner <gabriel.ruttner@gmail.com>
@abelanger5 abelanger5 merged commit 6a6038b into main Mar 4, 2024
17 checks passed
@abelanger5 abelanger5 deleted the belanger/worker-stability branch March 4, 2024 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants