Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new package cache lock modes #12706

Merged
merged 10 commits into from
Oct 9, 2023
Merged

Add new package cache lock modes #12706

merged 10 commits into from
Oct 9, 2023

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Sep 19, 2023

The way locking worked before this PR is that only one cargo could write to the package cache at once (otherwise it could cause corruption). However, it allowed cargo's to read from the package cache while running a build under the assumption that writers are append-only and won't affect reading. This allows multiple builds to run concurrently, only blocking on the part where it is not possible to run concurrently (downloading to the cache).

This introduces a new package cache locking strategy to support the ability to safely modify existing cache entries while other cargos are potentially reading from the cache. It has different locking modes:

  • MutateExclusive (new) — Held when cargo wants to modify existing cache entries (such as being introduced for garbage collection in Add cache garbage collection #12634), and ensures only one cargo has access to the cache during that time.
  • DownloadExclusive (renamed) — This is a more specialized name for the lock that was before this PR. A caller should acquire this when downloading into the cache and doing resolution. It ensures that only one cargo can append to the cache, but allows other cargos to concurrently read from the cache.
  • Shared (new) — This is to preserve the old concurrent build behavior by allowing multiple concurrent cargos to hold this while a build is running when it is reading from the cache

Reviewing suggestions:
There are a few commits needed to help with testing which are first. The main commit has the following:

  • src/cargo/util/cache_lock.rs is an abstraction around package cache locks, and is the heart of the change. It should have comments and notes which should guide what it is doing. The CacheLocker is stored in Config along with all our other global stuff.
  • Every call to config.acquire_package_cache_lock() has been changed to explicitly state which lock mode it wants to lock the package cache in.
  • Context::compile is the key point where the Shared lock is acquired, ensuring that no mutation is done while the cache is being read.
  • MutateExclusive is not used in this PR, but is being added in preparation for Add cache garbage collection #12634.
  • The non-blocking try_acquire_package_cache_lock API is not used in this PR, but is being added in preparation for Add cache garbage collection #12634 to allow automatic gc to skip running if another cargo is already running (to avoid unnecessary blocking).
  • src/cargo/util/flock.rs has been updated with some code cleanup (removing unused stuff), adds support for non-blocking locks, and renames some functions to make their operation clearer.
  • tests/testsuite/cache_lock.rs contains tests for all the different permutations of ways of acquiring locks.

@rustbot
Copy link
Collaborator

rustbot commented Sep 19, 2023

r? @weihanglo

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-configuration Area: cargo config files and env vars A-filesystem Area: issues with filesystems A-future-incompat Area: future incompatible reporting A-git Area: anything dealing with git A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. A-interacts-with-crates.io Area: interaction with registries A-registries Area: registries A-sparse-registry Area: http sparse registries A-testing-cargo-itself Area: cargo's tests Command-add Command-generate-lockfile Command-package Command-publish S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2023
@ehuss
Copy link
Contributor Author

ehuss commented Sep 19, 2023

This is split off from #12634.

@epage
Copy link
Contributor

epage commented Sep 20, 2023

Could you go into why these specific cache modes were chosen? What I'm wanting to understand is which areas of performance are we concerned about and how much this helps. I feel like the package cache locking in cargo was already confusing (being a un-Rust-like lock) and I'm concerned this will make things harder.

src/cargo/util/flock.rs Outdated Show resolved Hide resolved
src/cargo/util/flock.rs Outdated Show resolved Hide resolved
src/cargo/util/cache_lock.rs Outdated Show resolved Hide resolved
src/cargo/util/cache_lock.rs Outdated Show resolved Hide resolved
src/cargo/util/cache_lock.rs Outdated Show resolved Hide resolved
@ehuss
Copy link
Contributor Author

ehuss commented Sep 20, 2023

Could you go into why these specific cache modes were chosen? What I'm wanting to understand is which areas of performance are we concerned about and how much this helps.

The way locking worked before this PR is that only one cargo could write to the package cache at once (otherwise it could cause corruption). However, it allowed cargo's to read from the package cache while running a build under the assumption that writers won't modify anything that it is reading. This allows multiple builds to run concurrently, only blocking on the part where it is not possible to run concurrently (downloading to the cache).

This adds a "mutate" mode that ensures that no other cargo is accessing the cache in any way while it is running. This allows garbage collection to make any modifications it wants without worrying about stomping on something another cargo might be using.

The "share" mode is just a formalization of the way cargo used to run while it was doing a build. The reason it was added is to have something to block on the "mutate" lock, to ensure that no cargo is accessing the package cache while a mutate lock is held.

I'm not sure if that answers your question. The only performance concern is to ensure that multiple builds can run concurrently without blocking one another, and this preserves that behavior.

@epage
Copy link
Contributor

epage commented Sep 20, 2023

That makes sense. I also realized the intention last night. I've attempted to merge your description into the PR description; feel free to clear up anything I made confusing.

@bors
Copy link
Collaborator

bors commented Sep 29, 2023

☔ The latest upstream changes (presumably #12751) made this pull request unmergeable. Please resolve the merge conflicts.

This adds the `root` method to set the root directory to be able to
override paths::root().
This introduces a new `CacheLocker` which manages locks on the package
cache. Instead of either being "locked" or "not locked", the new locker
supports multiple modes:

- Shared lock: Cargo can read from the package sources, along with any
  other cargos reading at the same time.
- Download exclusive lock: Only one cargo can perform downloads.
  Download locks do not interfere with Shared locks, since it is
  expected that downloading does not modify existing files (only adds
  new ones).
- Mutate exclusive lock: Only one cargo can have this lock, and it also
  prevents shared locks. This is so that the cargo can modify the
  package cache (such as deleting files) without breaking concurrent
  processes.
This tries to align the `Filesystem` method names so they have the same
form, and also so they spell out exactly what they do. (An alternative
would be to have a builder, similar to `OpenOptions`, but that seems a
bit overkill here.)

This also extends the docs to try to make it clearer how this works.
@rustbot rustbot added A-layout Area: target output directory layout, naming, and organization A-lockfile Area: Cargo.lock issues labels Oct 9, 2023
@epage
Copy link
Contributor

epage commented Oct 9, 2023

@bors r+

@bors
Copy link
Collaborator

bors commented Oct 9, 2023

📌 Commit 78bb7c5 has been approved by epage

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 9, 2023
@epage
Copy link
Contributor

epage commented Oct 9, 2023

Thanks for your patience in working with my feedback!

@bors
Copy link
Collaborator

bors commented Oct 9, 2023

⌛ Testing commit 78bb7c5 with merge b48c41a...

@bors
Copy link
Collaborator

bors commented Oct 9, 2023

☀️ Test successful - checks-actions
Approved by: epage
Pushing b48c41a to master...

@bors bors merged commit b48c41a into rust-lang:master Oct 9, 2023
20 checks passed
@ehuss ehuss added this to the 1.75.0 milestone Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-configuration Area: cargo config files and env vars A-filesystem Area: issues with filesystems A-future-incompat Area: future incompatible reporting A-git Area: anything dealing with git A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. A-interacts-with-crates.io Area: interaction with registries A-layout Area: target output directory layout, naming, and organization A-lockfile Area: Cargo.lock issues A-registries Area: registries A-sparse-registry Area: http sparse registries A-testing-cargo-itself Area: cargo's tests Command-add Command-generate-lockfile Command-package Command-publish S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants