Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata cache instability #6332

Merged
merged 1 commit into from
Aug 21, 2024
Merged

Conversation

konstin
Copy link
Member

@konstin konstin commented Aug 21, 2024

For a path dep such as the root project, uv can read metadata statically from pyproject.toml or dynamically from the build backend.

Python's packaging sorts specifiers before emitting them, so all build backends built on top of it - such as hatchling - will change the specifier order compared to pyproject.toml. The core metadata spec does say "If a field is not marked as Dynamic, then the value of the field in any wheel built from the sdist MUST match the value in the sdist", but it doesn't specify if "match" means string equivalent or semantically equivalent, so it's arguable if that spec conformant. This change means that the specifiers have a different ordering when coming from the build backend than when read statically from pyproject.toml.

Previously, we tried to read path dep metadata in order:

  • From the (built wheel) cache (packaging order)
  • From pyproject.toml (verbatim specifier)
  • From a fresh build (packaging order)

This behaviour is unstable: On the first run, we cache is cold, so we read the verbatim specifier from pyproject.toml, then we build and store the metadata in the cache. On the second run, we read the packaging sorted specifier from the cache.

Reproducer:

rm -rf newproj
uv init -q --no-config newproj
cd newproj/
uv add -q "anyio>=4,<5"
cat uv.lock | grep "requires-dist"
uv sync -q
cat uv.lock | grep "requires-dist"
cd ..
requires-dist = [{ name = "anyio", specifier = ">=4,<5" }]
requires-dist = [{ name = "anyio", specifier = "<5,>=4" }]

A project either has static metadata, so we can read from pyproject.toml, or it doesn't, and we always read from the build through packaging. We can use this to stabilize the behavior by slightly switching the order.

  • From pyproject.toml (verbatim specifier)
  • From the (built wheel) cache (packaging order)
  • From a fresh build (packaging order)

Potentially, we still want to sort the specifiers we get anyway, after all, the is no guarantee that the specifiers from a build backend are deterministic. But our metadata reading behavior should be independent of the cache state, hence changing the order in the PR.

Fixes #6316

For a path dep such as the root project, uv can read metadata statically from `pyproject.toml` or dynamically from the build backend.

Python's `packaging` [sorts](https://github.com/pypa/packaging/blob/cc938f984bbbe43c5734b9656c9837ab3a28191f/src/packaging/specifiers.py#L777) specifiers before emitting them, so all build backends built on top of it - such as hatchling - will change the specifier order compared to pyproject.toml. The core metadata spec does say "If a field is not marked as Dynamic, then the value of the field in any wheel built from the sdist MUST match the value in the sdist", but it doesn't specify if "match" means string equivalent or semantically equivalent, so it's arguable if that spec conformant. This change means that the specifiers have a different ordering when coming from the build backend than when read statically from pyproject.toml.

Previously, we tried to read path dep metadata in order:
* From the (built wheel) cache (`packaging` order)
* From pyproject.toml (verbatim specifier)
* From a fresh build (`packaging` order)

This behaviour is unstable: On the first run, we cache is cold, so we read the verbatim specifier from `pyproject.toml`, then we build and store the metadata in the cache. On the second run, we read the `packaging` sorted specifier from the cache.

Reproducer:

```shell
rm -rf newproj
uv init -q --no-config newproj
cd newproj/
uv add -q "anyio>=4,<5"
cat uv.lock | grep "requires-dist"
uv sync -q
cat uv.lock | grep "requires-dist"
cd ..
```

```
requires-dist = [{ name = "anyio", specifier = ">=4,<5" }]
requires-dist = [{ name = "anyio", specifier = "<5,>=4" }]
```

A project either has static metadata, so we can read from pyproject.toml, or it doesn't, and we always read from the build through `packaging`. We can use this to stabilize the behavior by slightly switching the order.

* From pyproject.toml (verbatim specifier)
* From the (built wheel) cache (`packaging` order)
* From a fresh build (`packaging` order)

Potentially, we still want to sort the specifiers we get anyway, after all, the is no guarantee that the specifiers from a build backend are deterministic. But our metadata reading behavior should be independent of the cache state, hence changing the order in the PR.

Fixes #6316
@konstin konstin added the bug Something isn't working label Aug 21, 2024
konstin added a commit that referenced this pull request Aug 21, 2024
We currently normalize package and extra names and drop the whitespace from version specifiers, but we were not normalizing the order of the specifiers. By sorting them we match the behavior of `packaging` and become independent of build backends reordering specifiers (#6332).

Surprisingly, the snapshot diff isn't large - most people were already writing sorted specifiers. Still, this will lead to observable differences in lockfiles between releases.
konstin added a commit that referenced this pull request Aug 21, 2024
We currently normalize package and extra names and drop the whitespace from version specifiers, but we were not normalizing the order of the specifiers. By sorting them we match the behavior of `packaging` and become independent of build backends reordering specifiers (#6332).

Surprisingly, the snapshot diff isn't large - most people were already writing sorted specifiers. Still, this will lead to observable differences in lockfiles between releases.
konstin added a commit that referenced this pull request Aug 21, 2024
We currently normalize package and extra names and drop the whitespace from version specifiers, but we were not normalizing the order of the specifiers. By sorting them we match the behavior of `packaging` and become independent of build backends reordering specifiers (#6332).

Surprisingly, the snapshot diff isn't large - most people were already writing sorted specifiers. Still, this will lead to observable differences in lockfiles between releases.
@charliermarsh
Copy link
Member

Is this still strictly necessary with #6333? Or more that it's good to have consistent ordering independent of cache state?

@konstin
Copy link
Member Author

konstin commented Aug 21, 2024

Both this PR and #6333 would fix #6316, but i think this behavior is more consistent, so i'd merge this independent of #6316/#6333.

@konstin konstin merged commit cabca7b into main Aug 21, 2024
57 checks passed
@konstin konstin deleted the konsti/fix-specifier-instability branch August 21, 2024 15:18
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Aug 22, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.3.0` -> `0.3.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.3.1`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#031)

[Compare Source](astral-sh/uv@0.3.0...0.3.1)

##### Enhancements

-   Add `--with-editable` support to `uv run` ([#&#8203;6262](astral-sh/uv#6262))
-   Respect `.python-version` files and `pyproject.toml` in `uv python find` ([#&#8203;6369](astral-sh/uv#6369))
-   Allow manylinux compatibility override via `_manylinux` module ([#&#8203;6039](astral-sh/uv#6039))

##### CLI

-   Avoid treating `uv add -r` as `--raw-sources` ([#&#8203;6287](astral-sh/uv#6287))

##### Bug fixes

-   Always invoke found interpreter when `uv run python` is used ([#&#8203;6363](astral-sh/uv#6363))
-   Avoid adding extra newline for script with non-empty prelude ([#&#8203;6366](astral-sh/uv#6366))
-   Fix metadata cache instability for lockfile ([#&#8203;6332](astral-sh/uv#6332))
-   Handle Ctrl-C properly in `uvx` invocations ([#&#8203;6346](astral-sh/uv#6346))
-   Ignore workspace discovery errors with `--no-workspace` ([#&#8203;6328](astral-sh/uv#6328))
-   Invalidate `uv.lock` when virtual `dev-dependencies` change ([#&#8203;6291](astral-sh/uv#6291))
-   Make cache robust to removed archives ([#&#8203;6284](astral-sh/uv#6284))
-   Preserve Git username for SSH dependencies ([#&#8203;6335](astral-sh/uv#6335))
-   Respect `--no-build-isolation` in `uv add` ([#&#8203;6368](astral-sh/uv#6368))
-   Respect `.python-version` files in `uv run` outside projects ([#&#8203;6361](astral-sh/uv#6361))
-   Use `sys_executable` for `uv run` invocations ([#&#8203;6354](astral-sh/uv#6354))
-   Use atomic write for `pip compile` output ([#&#8203;6274](astral-sh/uv#6274))
-   Use consistent logic for deserializing short revisions ([#&#8203;6341](astral-sh/uv#6341))

##### Documentation

-   Remove the preview default value of `python-preference` ([#&#8203;6301](astral-sh/uv#6301))
-   Update env vars doc about `XDG_*` variables on macOS ([#&#8203;6337](astral-sh/uv#6337))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
konstin added a commit that referenced this pull request Aug 29, 2024
We currently normalize package and extra names and drop the whitespace from version specifiers, but we were not normalizing the order of the specifiers. By sorting them we match the behavior of `packaging` and become independent of build backends reordering specifiers (#6332).

Surprisingly, the snapshot diff isn't large - most people were already writing sorted specifiers. Still, this will lead to observable differences in lockfiles between releases.
konstin added a commit that referenced this pull request Aug 29, 2024
We currently normalize package and extra names and drop the whitespace
from version specifiers, but we were not normalizing the order of the
specifiers. By sorting them we match the behavior of `packaging` and
become independent of build backends reordering specifiers (#6332).

Surprisingly, the snapshot diff isn't large - most people were already
writing sorted specifiers. Still, this will lead to observable
differences in lockfiles between releases in cases where there are
entries in `requires-dist` that were not previously sorted (while the
total number of `requires-dist` is already small compared to the overall
lockfile).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uv lock specifier instability?
2 participants