-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: flush deletion queue on detach #5452
Conversation
This avoids risk of discarded LSN update warnings in the logs when doing a fast detach/attach cycle
2280 tests run: 2164 passed, 0 failed, 116 skipped (full report)Code coverage (full report)
The comment gets automatically updated with the latest test results
16977d2 at 2023-10-10T09:48:35.577Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a rebase.
Right now we would have requests get stuck and eventually time out if the queue gets stuck, which creates noisy errors.
I also wonder if we should somehow measure the queue length in a metric or warn if the queue length is larger than some limit. There is the risk of having a memory "leak" like the one fixed in #5472 ...
We do already have metrics that enable calculating queue length -- this was a good reminder for me to add charts for those to the pageserver dashboard. I'll follow up to add an alert for queue length. |
Push:
|
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
Problem
If a caller detaches a tenant and then attaches it again, pending deletions from the old attachment might not have happened yet. This is not a correctness problem, but it causes:
Summary of changes
flush_advisory
function, which is like flush_execute, but doesn't wait for completion: this is appropriate for use in contexts where we would like to encourage the deletion queue to flush, but don't need to block on it.Tenant
object.