Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against concurrent cache writes on Windows #11007

Merged
merged 3 commits into from
Jan 28, 2025
Merged

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Jan 27, 2025

Summary

On Windows, we have a lot of issues with atomic replacement and such. There are a bunch of different failure modes, but they generally involve: trying to persist a fail to a path at which the file already exists, trying to replace or remove a file while someone else is reading it, etc.

This PR adds locks to all of the relevant database paths. We already use these advisory locks when building source distributions; now we use them when unzipping wheels, storing metadata, etc.

Closes #11002.

Test Plan

I ran the following script:

# Define the cache directory path
$cacheDir = "C:\Users\crmar\workspace\uv\cache"

# Clear the cache directory if it exists
if (Test-Path $cacheDir) {
    Remove-Item -Recurse -Force $cacheDir
}

# Create the cache directory again
New-Item -ItemType Directory -Force -Path $cacheDir

# Define the command to run with --cache-dir flag
$command = {
    param ($venvPath)

    # Create a virtual environment in the specified path with --python
    uv venv $venvPath

    # Run the pip install command with --cache-dir flag
    C:\Users\crmar\workspace\uv\target\profiling\uv.exe pip install flask==1.0.4 --no-binary flask --cache-dir C:\Users\crmar\workspace\uv\cache -v --python $venvPath
}

# Define the paths for the different virtual environments
$venv1 = "C:\Users\crmar\workspace\uv\venv1"
$venv2 = "C:\Users\crmar\workspace\uv\venv2"
$venv3 = "C:\Users\crmar\workspace\uv\venv3"
$venv4 = "C:\Users\crmar\workspace\uv\venv4"
$venv5 = "C:\Users\crmar\workspace\uv\venv5"

# Start the command in parallel five times using Start-Job, each with a different venv
$job1 = Start-Job -ScriptBlock $command -ArgumentList $venv1
$job2 = Start-Job -ScriptBlock $command -ArgumentList $venv2
$job3 = Start-Job -ScriptBlock $command -ArgumentList $venv3
$job4 = Start-Job -ScriptBlock $command -ArgumentList $venv4
$job5 = Start-Job -ScriptBlock $command -ArgumentList $venv5

# Wait for all jobs to complete
$jobs = @($job1, $job2, $job3, $job4, $job5)
$jobs | ForEach-Object { Wait-Job $_ }

# Retrieve the results (optional)
$jobs | ForEach-Object { Receive-Job -Job $_ }

# Clean up the jobs
$jobs | ForEach-Object { Remove-Job -Job $_ }

And ensured it succeeded in five straight invocations (whereas on main, it consistently fails with a variety of different traces).

@charliermarsh charliermarsh added bug Something isn't working windows Specific to the Windows platform labels Jan 27, 2025
@charliermarsh
Copy link
Member Author

I'll fix up the tests on approval -- they're just count offsets due to writing more files to the cache (sadly).

@konstin
Copy link
Member

konstin commented Jan 28, 2025

Can you add the script you used to provoke the error and check that it's test? Otherwise we don't have a regression test should we ever have to revisit this code.

@@ -336,6 +336,13 @@ impl RegistryClient {
Connectivity::Offline => CacheControl::AllowStale,
};

// Acquire an advisory lock, to guard against concurrent writes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extend the comment explaining the [cfg(windows)]?

Comment on lines +48 to +52
/// On Windows, this uses the `junction` crate to create a junction point. The
/// operation is _not_ atomic, as we first delete the junction, then create a
/// junction at the same path.
///
/// Note that because junctions are used, the source must be a directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do i read this comment correctly that there is no way for us to make this atomic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try, but making it atomic doesn't solve the problem. We could still run into issues whereby we attempt to overwrite the file while it's open, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know, i didn't think of that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat tragic. I think we should still try to make this atomic though.

@charliermarsh
Copy link
Member Author

charliermarsh commented Jan 28, 2025

Can you add the script you used to provoke the error and check that it's test? Otherwise we don't have a regression test should we ever have to revisit this code.

Sorry, do you mean, add a test to the codebase itself? The script I used is in the PR summary.

@konstin
Copy link
Member

konstin commented Jan 28, 2025

Sorry, totally overlooked that. Is it possible to port this to an integration test, or alternatively could we add that script to scripts/?

@charliermarsh charliermarsh merged commit f1840c7 into main Jan 28, 2025
73 checks passed
@charliermarsh charliermarsh deleted the charlie/windows branch January 28, 2025 20:33
@charliermarsh
Copy link
Member Author

@konstin -- I'm just gonna leave it in here for now. I think this is ~as good as putting it in scripts given that we'll never run it and it will likely bitrot there.

tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 29, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.24` -> `0.5.25` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.5.25`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0525)

[Compare Source](astral-sh/uv@0.5.24...0.5.25)

##### Enhancements

-   Allow installation of manylinux wheels on loongarch64 ([#&#8203;10927](astral-sh/uv#10927))
-   Allow optional `=` for editables in `requirements.txt` ([#&#8203;10954](astral-sh/uv#10954))
-   Add Windows aarch64 to the release binaries ([#&#8203;10885](astral-sh/uv#10885))

##### Bug fixes

-   Use spec-compliant (`128+n`) exit codes for `uv run` and `uv tool run` on Unix ([#&#8203;10781](astral-sh/uv#10781))
-   Fix best-interpreter lookups when there is an invalid interpreter in the `PATH` ([#&#8203;11030](astral-sh/uv#11030))
-   Guard against concurrent cache writes on Windows ([#&#8203;11007](astral-sh/uv#11007))
-   Prioritize package preferences with greater package versions ([#&#8203;10963](astral-sh/uv#10963))
-   Reject `--editable` flag on non-directory requirements ([#&#8203;10994](astral-sh/uv#10994))
-   Respect `--no-sources` for `uv pip install` workspace discovery ([#&#8203;11003](astral-sh/uv#11003))
-   Set `JEMALLOC_SYS_WITH_LG_PAGE=16` in ARM Docker builds ([#&#8203;10943](astral-sh/uv#10943))
-   Update `riscv64` Python downloads to allow install on `riscv64gc` ([#&#8203;10937](astral-sh/uv#10937))
-   Fix file persist retries on Windows ([#&#8203;11008](astral-sh/uv#11008))
-   Fix incorrect error message when specifying `tool.uv.sources.(package).workspace` with other options ([#&#8203;11013](astral-sh/uv#11013))
-   Improve SIGINT handling in `uv run` ([#&#8203;11009](astral-sh/uv#11009))

##### Documentation

-   Add `SECURITY` policy ([#&#8203;11035](astral-sh/uv#11035))
-   Add `Requires-Python` upper bound behavior to the docs ([#&#8203;10964](astral-sh/uv#10964))
-   Add a troubleshooting section and reproducible example guide ([#&#8203;10947](astral-sh/uv#10947))
-   Add documentation for `uv add -r` ([#&#8203;10926](astral-sh/uv#10926))
-   Amend `requires-python` rules in resolver documentation ([#&#8203;10993](astral-sh/uv#10993))
-   Reference workspaces in `--no-sources` documentation ([#&#8203;10995](astral-sh/uv#10995))
-   Update documentation for activating virtual environments in different shell ([#&#8203;11000](astral-sh/uv#11000))
-   Add Docker SHA pinning tip ([#&#8203;10955](astral-sh/uv#10955))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMzcuMiIsInVwZGF0ZWRJblZlciI6IjM5LjEzNy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
styvane added a commit to styvane/uv that referenced this pull request Jan 29, 2025
* main: (53 commits)
  Shorten "Using existing Python versions" nav item so it fits on one line (astral-sh#11077)
  docs: suggest copy linking for GitLab integration guide (astral-sh#11067)
  Refactor `uv tool run` hint into separate function (astral-sh#11069)
  Fix typo in no-deps docs/comments/cli description (astral-sh#11073)
  Allow `--no-dev --invert` in `uv tree` (astral-sh#11068)
  Add docs for signal handling (astral-sh#11041)
  Add a bit more context about SIGTERM and PID 1 (astral-sh#11036)
  Reflow CLI documentation comments (astral-sh#11040)
  doc typo: unnecessary backslashes to represent brackets in markdown (astral-sh#11059)
  Update Dependabot links (astral-sh#11054)
  Document `gather_credentials` (astral-sh#11024)
  Link to our MRE documentation in the issue template (astral-sh#11045)
  Avoid sharing state between universal and non-universal resolves (astral-sh#11051)
  Mark metadata as dynamic when reading from built wheel cache (astral-sh#11046)
  Fix formatting of `RUST_LOG` documentation (astral-sh#10053)
  Bump version to 0.5.25 (astral-sh#11042)
  Add CVE disclosure to security policy (astral-sh#11037)
  Guard against concurrent cache writes on Windows (astral-sh#11007)
  Add SECURITY policy (astral-sh#11035)
  Improve SIGINT handling in `uv run` (astral-sh#11009)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows Specific to the Windows platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent cache access causes errors on Windows
3 participants