-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokio-epoll-uring: "Cannot allocate memory" in staging on 2024-02-07 #6667
Closed
10 of 13 tasks
Tracked by
#6665
Comments
problame
added a commit
that referenced
this issue
Feb 7, 2024
problame
added a commit
to neondatabase/tokio-epoll-uring
that referenced
this issue
Feb 7, 2024
…de a spawn_blocking Context: neondatabase/neon#6667
problame
added a commit
to neondatabase/tokio-epoll-uring
that referenced
this issue
Feb 7, 2024
…de a spawn_blocking (#44) Context: neondatabase/neon#6667
problame
added a commit
that referenced
this issue
Feb 7, 2024
problame
added a commit
that referenced
this issue
Feb 7, 2024
problame
added a commit
that referenced
this issue
Feb 14, 2024
… callers (#6731) Some callers of `VirtualFile::crashsafe_overwrite` call it on the executor thread, thereby potentially stalling it. Others are more diligent and wrap it in `spawn_blocking(..., Handle::block_on, ... )` to avoid stalling the executor thread. However, because `crashsafe_overwrite` uses VirtualFile::open_with_options internally, we spawn a new thread-local `tokio-epoll-uring::System` in the blocking pool thread that's used for the `spawn_blocking` call. This PR refactors the situation such that we do the `spawn_blocking` inside `VirtualFile::crashsafe_overwrite`. This unifies the situation for the better: 1. Callers who didn't wrap in `spawn_blocking(..., Handle::block_on, ...)` before no longer stall the executor. 2. Callers who did it before now can avoid the `block_on`, resolving the problem with the short-lived `tokio-epoll-uring::System`s in the blocking pool threads. A future PR will build on top of this and divert to tokio-epoll-uring if it's configures as the IO engine. Changes ------- - Convert implementation to std::fs and move it into `crashsafe.rs` - Yes, I know, Safekeepers (cc @arssher ) added `durable_rename` and `fsync_async_opt` recently. However, `crashsafe_overwrite` is different in the sense that it's higher level, i.e., it's more like `std::fs::write` and the Safekeeper team's code is more building block style. - The consequence is that we don't use the VirtualFile file descriptor cache anymore. - I don't think it's a big deal because we have plenty of slack wrt production file descriptor limit rlimit (see [this dashboard](https://neonprod.grafana.net/d/e4a40325-9acf-4aa0-8fd9-f6322b3f30bd/pageserver-open-file-descriptors?orgId=1)) - Use `tokio::task::spawn_blocking` in `VirtualFile::crashsafe_overwrite` to call the new `crashsafe::overwrite` API. - Inspect all callers to remove any double-`spawn_blocking` - spawn_blocking requires the captures data to be 'static + Send. So, refactor the callers. We'll need this for future tokio-epoll-uring support anyway, because tokio-epoll-uring requires owned buffers. Related Issues -------------- - overall epic to enable write path to tokio-epoll-uring: #6663 - this is also kind of relevant to the tokio-epoll-uring System creation failures that we encountered in staging, investigation being tracked in #6667 - why is it relevant? Because this PR removes two uses of `spawn_blocking+Handle::block_on`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sentry-captured stack trace
tl;dr: a
Handle::block_on
call insidespawn_blocking
Kernel version is
5.10.0-18-cloud-amd64
=> As per our findings in #6373 (comment) , this means the process ran out of memlock rusage quota.
Action Items
Impl
/metrics
#6669/metrics
#6672The text was updated successfully, but these errors were encountered: