-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: add Gate
as a partner to CancellationToken for safe shutdown of Tenant
& Timeline
#5711
Conversation
Gate
as a partner to CancellationToken for safe shutdown of Tenant
& Timeline
d8245f2
to
ce3e371
Compare
2358 tests run: 2243 passed, 0 failed, 115 skipped (full report)Flaky tests (2)Postgres 16
Postgres 14
Code coverage (full report)
The comment gets automatically updated with the latest test results
0eb1060 at 2023-11-06T12:17:44.227Z :recycle: |
ce3e371
to
13fd042
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike how this seems to duplicate what utils::completion aims to do (which tokio now has as TaskTracker), but I don't have more time to review right now.
I wouldn't say it duplicates it: neither one is a concurrency primitive (both are wrappers), and they have different goals:
One could implement Gate as a funky wrapper on Barrier, but the way I look at it they're both ergonomic wrappers around tokio concurrency types, so it makes sense to have two different types to implement two different usage styles. |
13fd042
to
c1ad27a
Compare
It introduces more state than we need: my idea was to add a completions to state (task local) from which we can start new work. If we are not in that state, we cannot obtain a guard thus we cannot start new work. If we transitioning from the state, then we must wait out all outstanding work. |
5243245
to
8855605
Compare
The page service request handlers are now cleanly cancelled+gated during Timeline::shutdown. The task_mgr::shutdown_tasks call is only cleaning up any straggler tasks that have exited the Timeline gate but not yet terminated execution.
8855605
to
5cf7b7c
Compare
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
854425f
to
28ffd2b
Compare
4b7bdcd
to
441fed8
Compare
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Problem
When shutting down a Tenant, it isn't just important to cause any background tasks to stop. It's also important to wait until they have stopped before declaring shutdown complete, in cases where we may re-use the tenant's local storage for something else, such as running in secondary mode, or creating a new tenant with the same ID.
Summary of changes
A
Gate
class is added, inspired by seastar::gate. For types that have an important lifetime that corresponds to some physical resource, use of a Gate as well as a CancellationToken provides a robust pattern for async requests & shutdown:close()
the gate to wait for requests in progress before returning.This is not for memory safety: it's for expressing the difference between "Arc exists", and "This tenant's files on disk are eligible to be read/written".
task_mgr::associate_with
: tasks no longer change their tenant/timelineidentity after being spawned.The Tenant's Gate is not yet used, but will be important for Tenant-scoped operations in secondary mode, where we must ensure that our secondary-mode downloads for a tenant are gated wrt the activity of an attached Tenant.
This is part of a broader move away from using the global-state driven
task_mgr
shutdown tokens: