Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal, rust] memmap is unmaintained #14341

Closed
asherf opened this issue Feb 2, 2022 · 0 comments · Fixed by #19711
Closed

[internal, rust] memmap is unmaintained #14341

asherf opened this issue Feb 2, 2022 · 0 comments · Fixed by #19711
Assignees

Comments

@asherf
Copy link
Member

asherf commented Feb 2, 2022

https://rustsec.org/advisories/RUSTSEC-2020-0077
there are a few alternatives mentioned on the advisory page.
https://github.com/pantsbuild/pants/runs/5033682486?check_suite_focus=true#step:5:228

@stuhood

@stuhood stuhood self-assigned this Mar 17, 2022
huonw added a commit that referenced this issue Sep 12, 2023
…9711)

This (hopefully) optimises storing large blobs to a remote cache, by
streaming them directly from the file stored on disk in the "FSDB".

This builds on the FSDB local store work (#18153), relying on large
objects being stored as an immutable file on disk, in the cache managed
by Pants.

This is an optimisation in several ways:

- Cutting out an extra temporary file:
- Previously `Store::store_large_blob_remote` would load the whole blob
from the local store and then write that to a temporary file. This was
appropriate with LMBD-backed blobs.
- With new FSDB, there's already a file that can be used, no need for
that temporary, and so the file creation and writing overhead can be
eliminated .
- Reducing sync IO in async tasks, due to mmap:
- Previously `ByteStore::store_buffered` would take that temporary file
and mmap it, to be able to slice into `Bytes` more efficiently... except
this is secretly blocking/sync IO, happening within async tasks (AIUI:
when accessing a mmap'd byte that's only on disk, not yet in memory, the
whole OS thread is blocked/descheduled while the OS pulls the relevant
part of the file into memory, i.e. `tokio` can't run another task on
that thread).
- This new approach uses normal `tokio` async IO mechanisms to read the
file, and thus hopefully this has higher concurrency.
  - (This also eliminates the unmaintained `memmap` dependency.)

I haven't benchmarked this though.

My main motivation for this is firming up the provider API before adding
new byte store providers, for #11149. This also resolves some TODOs and
even eliminates some `unsafe`, yay!

The commits are individually reviewable.

Fixes #19049, fixes #14341 (`memmap` removed), closes #17234 (solves the
same problem but with an approach that wasn't possible at the time).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants