Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert many VirtualFile APIs to async #5190

Merged
merged 11 commits into from
Sep 4, 2023
Merged

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Sep 2, 2023

Problem

VirtualFile does both reading and writing, and it would be nice if both could be converted to async, so that it doesn't have to support an async read path and a blocking write path (especially for the locks this is annoying as none of the lock implementations in std, tokio or parking_lot have support for both async and blocking access).

Summary of changes

This PR is some initial work on making the VirtualFile APIs async. It can be reviewed commit-by-commit.

  • Introduce the MaybeVirtualFile enum to be generic in a test that compares real files with virtual files.
  • Make various APIs of VirtualFile async, including write_all_at, read_at, read_exact_at.

Part of #4743 , successor of #5180.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@arpad-m arpad-m requested review from a team as code owners September 2, 2023 09:00
@arpad-m arpad-m requested review from awestover and shanyp and removed request for a team September 2, 2023 09:00
@arpad-m arpad-m force-pushed the arpad/virtual_file_async_2 branch from 9fe2a59 to d224ca5 Compare September 2, 2023 09:01
@github-actions
Copy link

github-actions bot commented Sep 2, 2023

1624 tests run: 1550 passed, 0 failed, 74 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_get_tenant_size_with_multiple_branches: debug
The comment gets automatically updated with the latest test results
021c0bc at 2023-09-04T14:47:59.946Z :recycle:

@arpad-m
Copy link
Member Author

arpad-m commented Sep 2, 2023

This failure is legitimate (I can reproduce it locally):

thread 'virtual_file::tests::test_virtual_files' panicked at 'assertion failed: file_a.read_string().await.is_err()', pageserver/src/virtual_file.rs:680:9
stack backtrace:

I don't know where the problem is from, as the OpenOptions are being passed through, so the file should theoretically be open as write-only. Need to investigate this more starting Monday.

Copy link
Collaborator

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally reasonable, it was a bug that any of these were sync to begin with :-)

@arpad-m
Copy link
Member Author

arpad-m commented Sep 4, 2023

So looking at strace output, with some debug information, we get for the working native file:

[pid 155241] openat(AT_FDCWD, "../tmp_check/test_physical_files/file_a", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 9
[pid 155241] write(9, "foobar", 6)      = 6
[pid 155241] fcntl(9, F_GETFL)          = 0x8001 (flags O_WRONLY|O_LARGEFILE)
[pid 155241] write(1, "Native File flags are: 0x8001 of"..., 46Native File flags are: 0x8001 oflag: O_WRONLY
) = 46
[pid 155241] statx(9, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=6, ...}) = 0
[pid 155241] lseek(9, 0, SEEK_CUR)      = 6
[pid 155241] read(9, 0x7ff69800c820, 32) = -1 EBADF (Bad file descriptor)
[pid 155241] openat(AT_FDCWD, "../tmp_check/test_physical_files/file_a", O_RDONLY|O_CLOEXEC) = 10
[pid 155241] write(10, "bar", 3)        = -1 EBADF (Bad file descriptor)

And for the non-working VirtualFile:



[pid 155242] openat(AT_FDCWD, "../tmp_check/test_virtual_files/file_a", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 9
[pid 155242] fcntl(9, F_GETFL)          = 0x8001 (flags O_WRONLY|O_LARGEFILE)
[pid 155242] write(1, "Open with options: 0x8001 oflag:"..., 42Open with options: 0x8001 oflag: O_WRONLY
) = 42
[pid 155242] pwrite64(9, "foobar", 6, 0) = 6
[pid 155242] statx(9, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=6, ...}) = 0
[pid 155242] fcntl(9, F_GETFL)          = 0x8001 (flags O_WRONLY|O_LARGEFILE)
[pid 155242] write(1, "VirtualFile File flags are (len "..., 59VirtualFile File flags are (len 0): 0x8001 oflag: O_WRONLY
) = 59
[pid 155242] write(2, "thread '", 8thread ')    = 8
[pid 155242] write(2, "virtual_file::tests::test_virtua"..., 39virtual_file::tests::test_virtual_files) = 39
[pid 155242] write(2, "' panicked at '", 15' panicked at ') = 15
[pid 155242] write(2, "called `Result::unwrap_err()` on"..., 50called `Result::unwrap_err()` on an `Ok` value: "") = 50
[pid 155242] write(2, "', ", 3', )         = 3
[pid 155242] write(2, "pageserver/src/virtual_file.rs", 30pageserver/src/virtual_file.rs) = 30

I think the problem lies in the usage of read_at. I will try again with using normal read, i think this fixes it.

This fixes the test failures
@arpad-m
Copy link
Member Author

arpad-m commented Sep 4, 2023

It turns out that the test failures were due to missing tracking of where we are in the file. Now, I introduced read_to_end, only used by tests, with offset tracking via self.pos, and it works. Merging once CI is green.

@arpad-m arpad-m merged commit 128a85b into main Sep 4, 2023
@arpad-m arpad-m deleted the arpad/virtual_file_async_2 branch September 4, 2023 15:05
@arpad-m
Copy link
Member Author

arpad-m commented Sep 5, 2023

To expand on why I did the last commit with read_to_end, bringing the discussion out of DMs:

  • The test failure one would run into at the start would be that read_string on a write-only file would succeed because the cursor would be at the end of the file, and thus there'd be zero data to read. The code would do a read call with zero size, and there is an optimization somewhere that lets that read call return an empty buffer without actually issuing a read system call (you can see above that there is no read system call happening). I first "worked around" this issue by seeking to the start of the file before doing read_string, but this is just changing the test to accomodate for behaviour changes, so not a good idea: the regression would still be there. Instead, now we do what the file API is also doing: read a non-zero piece of data, which then can still return 0, if nothing is available, but more importantly, it does a system call and thus yields the expected error.
  • VirtualFile has a pos field. The code before the last commit recalculated pos via performing a seek operation, which is sub-optimal. Ideally we'd use the already existing position tracking.
  • More importantly, read_string on native files affects the position inside the file, while our read_string implementation wouldn't. This was because we used read_string_at which doesn't update pos. Therefore, the test would fail where it first expected the read to return "foobar" and then to return "" (with us returning foobar both times). The read_to_end implementation now correctly updates pos (with a fine granularity even, so in error conditions we also get similar update behaviour of our cursor).

arpad-m added a commit that referenced this pull request Sep 6, 2023
… async (#5203)

## Problem

We want to convert the `VirtualFile` APIs to async fn so that we can
adopt one of the async I/O solutions.

## Summary of changes

This PR is a follow-up of #5189, #5190, and #5195, and does the
following:

* Move the used `Write` trait functions of `VirtualFile` into inherent
functions
* Add optional buffering to `WriteBlobWriter`. The buffer is discarded
on drop, similarly to how tokio's `BufWriter` does it: drop is neither
async nor does it support errors.
* Remove the generics by `Write` impl of `WriteBlobWriter`, alwaays
using `VirtualFile`
* Rename `WriteBlobWriter` to `BlobWriter`
* Make various functions in the write path async, like
`VirtualFile::{write,write_all}`.

Part of #4743.
arpad-m added a commit that referenced this pull request Sep 7, 2023
…} async fn (#5224)

## Problem

Once we use async file system APIs for `VirtualFile`, these functions
will also need to be async fn.

## Summary of changes

Makes the functions `open, open_with_options, create,sync_all,with_file`
of `VirtualFile` async fn, including all functions that call it. Like in
the prior PRs, the actual I/O operations are not using async APIs yet,
as per request in the #4743 epic.

We switch towards not using `VirtualFile` in the par_fsync module,
hopefully this is only temporary until we can actually do fully async
I/O in `VirtualFile`. This might cause us to exhaust fd limits in the
tests, but it should only be an issue for the local developer as we have
high ulimits in prod.

This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.
arpad-m added a commit that referenced this pull request Sep 11, 2023
## Problem

For #4743, we want to convert everything up to the actual I/O operations
of `VirtualFile` to `async fn`.

## Summary of changes

This PR is the last change in a series of changes to `VirtualFile`:
#5189, #5190, #5195, #5203, and #5224.

It does the last preparations before the I/O operations are actually
made async. We are doing the following things:

* First, we change the locks for the file descriptor cache to tokio's
locks that support Send. This is important when one wants to hold locks
across await points (which we want to do), otherwise the Future won't be
Send. Also, one shouldn't generally block in async code as executors
don't like that.
* Due to the lock change, we now take an approach for the `VirtualFile`
destructors similar to the one proposed by #5122 for the page cache, to
use `try_write`. Similarly to the situation in the linked PR, one can
make an argument that if we are in the destructor and the slot has not
been reused yet, we are the only user accessing the slot due to owning
the lock mutably. It is still possible that we are not obtaining the
lock, but the only cause for that is the clock algorithm touching the
slot, which should be quite an unlikely occurence. For the instance of
`try_write` failing, we spawn an async task to destroy the lock. As just
argued however, most of the time the code path where we spawn the task
should not be visited.
* Lastly, we split `with_file` into a macro part, and a function part
that contains most of the logic. The function part returns a lock
object, that the macro uses. The macro exists to perform the operation
in a more compact fashion, saving code from putting the lock into a
variable and then doing the operation while measuring the time to run
it. We take the locks approach because Rust has no support for async
closures. One can make normal closures return a future, but that
approach gets into lifetime issues the moment you want to pass data to
these closures via parameters that has a lifetime (captures work). For
details, see
[this](https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/)
and
[this](https://users.rust-lang.org/t/function-that-takes-an-async-closure/61663)
link. In #5224, we ran into a similar problem with the `test_files`
function, and we ended up passing the path and the `OpenOptions`
by-value instead of by-ref, at the expense of a few extra copies. This
can be done as the data is cheaply copyable, and we are in test code.
But here, we are not, and while `File::try_clone` exists, it [issues
system calls
internally](https://github.com/rust-lang/rust/blob/1e746d7741d44551e9378daf13b8797322aa0b74/library/std/src/os/fd/owned.rs#L94-L111).
Also, it would allocate an entirely new file descriptor, something that
the fd cache was built to prevent.
* We change the `STORAGE_IO_TIME` metrics to support async.

Part of #4743.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants