-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trim-paths: Remap rules for different dependency kinds #13171
Comments
We should make sure we keep the comment from https://github.com/rust-lang/cargo/blob/master/src/cargo/util/workspace.rs#L97 in mind /// The path that we pass to rustc is actually fairly important because it will
/// show up in error messages (important for readability), debug information
/// (important for caching), etc. As a result we need to be pretty careful how we
/// actually invoke rustc.
///
/// In general users don't expect `cargo build` to cause rebuilds if you change
/// directories. That could be if you just change directories in the package or
/// if you literally move the whole package wholesale to a new directory. As a
/// result we mostly don't factor in `cwd` to this calculation. Instead we try to
/// track the workspace as much as possible and we update the current directory
/// of rustc/rustdoc where appropriate.
///
/// The first returned value here is the argument to pass to rustc, and the
/// second is the cwd that rustc should operate in.
I think we're fine but wanted to make sure we keep this in mind |
For registry, the path prefix would be things like For git, the path prefix would be things like What are the stability guarantees for those paths? |
I wonder what code path these would go down and what the "for free" option is. If its registries, I expect the do-nothing would be good enough. |
They should be fairly stable on each platform across different Rust versions. See here and here. Granted, we should have a test to verify
cargo/src/cargo/core/compiler/mod.rs Line 1220 in 6982b44
I was wrong with local-registry. Unpacked sources go into the same People might want to remap vendor sources, as they might do release builds with them. However, from the above code path cargo only knows the pkg is from a I still think it's better to remap to relative paths for vendor sources, as they are designed to be used as local files. If people can get free debugging experience without configuring remap restore, then why not? |
Thanks for picking this up, @weihanglo!
Please note that the
instead of just
A debugger would look for a file |
Stability across platforms and across time would indeed be desirable. The test case linked above excludes a few platforms. The compiler has an example of a hasher the gives the same result on different kinds of platforms. |
Hmm… Maybe I understand the original concern wrong 🤔 ? I would expect remapped paths to be like
So that we can restore the prefix-map by searching a specific prefix like # in gdb
set substitute-path registry@ /home/user/.cargo/registry/src
# in lldb
settings set target.source-map git@ /home/user/.cargo/git/checkouts That makes the substitution rules down to 2. |
Good reference! Thank you! I guess cargo might want to adopt some techniques there to have a stable hash across platforms (though it may break things on some platforms). Despite that, is having a cross-platform stable hash in debuginfo a hard requirement? I assume things like DWARF are platform/architecture dependant. We cannot use DWARF generated on aarch64 to debug on x86_64. But again, I might be wrong 😞. |
Yes, that looks correct to me. The only change I'm suggesting is to omit the
we would do this:
or add That should have the same effect, but path substitution is less widely supported than specifying a source path. For example, I was not able to find an equivalent feature in WinDbg or the MSVC debugger. And when using any kind of source server, having something that does not match the file system path will likely also lead to complications. It's quite possible that all of these issues could be worked around somehow, but why make things more complicated than necessary? I'd like trim-paths to be the default (at least for release builds), and I think that will be easier if we remove as much friction as possible.
I'm not 100% sure if it has to be a hard requirement -- but it might also be something where we can prevent headaches down the line. The hash value basically represents a URL/URI, right? Registries in general are not platform specific, so there is no reason why an identifier for a registry should be platform specific. And there are quite a few cross platform scenarios involving debuginfo. Analyzing crashdumps from platform X on platform Y is pretty common, or cross-building from one platform to another. E.g. most 32-bits builds probably happen on 64-bit hosts, right? I think it might be worth biting the bullet now and aligning these "short-names" on all platforms. |
Totally agree on this. We might want to make hashes cross-platform stable. Just that would invalidate source caches on some platforms. Do you think this is a major blocker on |
Sorry I haven't gotten back to this. Maybe to put my earlier question in a different way. The old prefixes basically said "this source is from some place, get it yourself". The new format seems to imply that the user should expect the source to be made available in How strong do we expect these expectations to be, particularly
For example,
|
I am pretty sure that we don't want to change the layout too often, as it may hurt caches. It is also the reason that we want a cross-platform stable hash for the path prefixes. That said, if necessary, we can definitely change the layout.
I don't fully get this. But the assumption is the user has an existing
That's unfortunately yes. Alternatively we could provide a |
Some thoughts:
It looks to me like there are two separate aspects here:
I think that use case is supported best via symbol servers and/or SourceLink. In both cases, there is a snapshot of the source available that's persistently linked to a specific executable, so we don't have to worry about any changes.
This is something I'd like to explore (together with the debugging working group) in any case. It could provide a lot of useful functionality that the current rust-gdb and rust-lldb wrappers can't because they don't know about cargo projects. But that is kind of a separate topic. |
This is a brilliant idea! However, people might want a platform specific source links directory instead of
That's a good idea, but might need to resolve name conflict issues like #12516. |
Sorry for how slow I've been on this.
So for the first, "rebuild" and for the second, "remap". If we're going in this direction, should we simplify this by just remaping to
SourceLink is the inspiration for the comment. The problem is, last I looked, gdb doesn't have the concept. When I last dealt with this (porting a Windows shop to Linux), we used fully qualified, cross-system paths that you can remap as needed. Hmm, maybe what we have is enough to do something like that? I've been so slow at this (sorry) that I lost track!
We have at least one major change to |
No worries -- as you can see, I'm not particularly fast either 🙂
I don't think I understand. Can you elaborate?
Yes, it sounds like something like this is planned for the next version of DWARF, but LLVM doesn't seem to implement this yet either. There are the debuginfod and Microsoft symbol servers, that seem to have existing support. I'll try to find out if these would need anything special from trim-paths. But overall we just have to make sure not to create any footguns for those, I think.
Is there any such path that we could declare stable? I think overall the approach might solve a couple of problems. |
Usually when using a hash function for such things, you want a strongly-collision-resistant hash function. But the "rustc-stable-hasher" is infamously not a strongly-collision-resistant hash function. It would be good to avoid making more use of it. |
Could you elaborate more on why this needs to be strong in terms of collision resistant? The current algorithm SipHash is strong for HashDOS protection but not really a cryptographic hash function either. |
Most of the interesting parts of making the hash value stable (~platform independent) is happening in the outer layer of |
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) Two caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things.
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) Two caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things.
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequnce of changing how `SourceId` is hashed, the global cache tracker is also affected becauseCargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequence of changing how `SourceId` is hashed, the global cache tracker is also affected because Cargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
Posted by @briansmith in #14116 (comment):
The collision could happen, yes. However, for registry index the identifier is like |
Presumably the "6f17d22bba15001f" there is the hash of something. What happens when, in the same registry, somebody publishes two crates X and Y that end up with the same hash? Is the assumption that the registry will check for a collision at publication time and reject the second one? If so, potentially you may be free of issues related to collisions for the registry case, but you may open yourself up to DoS if somebody could precompute of your hash that would collide with somebody else's to-be-published crate. (Precomputing a hash is called preimage and it is a harder problem than collisions.) |
It is the hash of the registry url itself. If you have two registries under the same domain, you could get a hash collision. The infra team would have to explicitly place another registry on index.crates.io to get a collision here. This is not something a random package uploader can do. If you have two crates in the same registry with the same name and version however, cargo will consider it as a single version of a single crate and if there is a lockfile, refuse to download the replaced version as the checksum differs from what it should be according to the lockfile. And crates.io should never allow this either. |
OK, but then why use siphash for this? It doesn't seem like it would be justified by the performance. |
SipHash doesn't need to be the only choice. What we need for source mapping is a stable hash that won't change between different platforms, endianness, bit-width, and so on. |
I've opened an issue in the stable hash repo about supporting other hash algorithms. It should be straightforward. Supporting multiple hash algorithms within rustc might lead to some code bloat or dynamic dispatch overhead, but is a different story 🙂 |
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequence of changing how `SourceId` is hashed, the global cache tracker is also affected because Cargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequence of changing how `SourceId` is hashed, the global cache tracker is also affected because Cargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequence of changing how `SourceId` is hashed, the global cache tracker is also affected because Cargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. See rust-lang#13171 (comment) A few caveats: * This will invalidate the current downloaded caches. Need to put this in the Cargo CHANGELOG. * As a consequence of changing how `SourceId` is hashed, the global cache tracker is also affected because Cargo writes source identifiers (e.g. `index.crates.io-6f17d22bba15001f`) to SQLite. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391 * The performance of rustc-stable-hash is slightly worse than the old SipHasher in std on short things like `SourceId`, but for long stuff like fingerprint. See appendix. StableHasher is used in several places (some might not be needed?): * Rebuild detection (fingerprints) * Rustc version, including all the CLI args running `rustc -vV`. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326 * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381 * Build caches * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456 * Compute rustc `-C metadata` * stable hash for SourceId * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207 * Also read and hash contents from custom target JSON file. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91 * `UnitInner::dep_hash` * This is to distinguish same units having different features set between normal and build dependencies. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627 * Hash file contents for `cargo package` to verify if files were modified before and after the build. * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999 * Rusc diagnostics deduplication * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311 * Places using `SourceId` identifier like `registry/src` path, and `-Zscript` target directories. Appendix -------- Benchmark on x86_64-unknown-linux-gnu ``` bench_hasher/RustcStableHasher/URL time: [33.843 ps 33.844 ps 33.845 ps] change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/URL time: [18.954 ns 18.954 ns 18.955 ns] change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe bench_hasher/RustcStableHasher/lorem ipsum time: [659.18 ns 659.20 ns 659.22 ns] change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe bench_hasher/SipHasher/lorem ipsum time: [1.2006 µs 1.2008 µs 1.2010 µs] change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ```
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. It also helps cache registry index all at once for all platforms, for example the use case in rust-lang#14795 (despite they should use `cargo vendor` instead IMO). Some caveats, * Newer cargo will need to re-download files for global caches (index files, git/registry sources). The old cache is still kept and used when running with older cargoes. * Windows MSVC is not really covered by the "cross-platform` hash There might be hash collisions if you have two registries under the same domain. This won't happen to crates.io, as the infra would have to intentionally put another registry on index.crates.io to collide. We don't consider this is an actual threat model, so we are not going to use any cryptographically secure hash algorithm like BLAKE3. See also <rust-lang#13171 (comment)>
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. It also helps cache registry index all at once for all platforms, for example the use case in rust-lang#14795 (despite they should use `cargo vendor` instead IMO). Some caveats: * Newer cargo will need to re-download files for global caches (index files, git/registry sources). The old cache is still kept and used when running with older cargoes. * Windows is not really covered by the "cross-platform` hash, because path prefix components like `C:` are always there. That means hashes of some sources kind, like local registry and local path, are not going to be real cross-platform stable. There might be hash collisions if you have two registries under the same domain. This won't happen to crates.io, as the infra would have to intentionally put another registry on index.crates.io to collide. We don't consider this is an actual threat model, so we are not going to use any cryptographically secure hash algorithm like BLAKE3. See also <rust-lang#13171 (comment)>
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. It also helps cache registry index all at once for all platforms, for example the use case in rust-lang#14795 (despite they should use `cargo vendor` instead IMO). Some caveats: * Newer cargo will need to re-download files for global caches (index files, git/registry sources). The old cache is still kept and used when running with older cargoes. * Windows is not really covered by the "cross-platform" hash, because path prefix components like `C:` are always there. That means hashes of some sources kind, like local registry and local path, are not going to be real cross-platform stable. There might be hash collisions if you have two registries under the same domain. This won't happen to crates.io, as the infra would have to intentionally put another registry on index.crates.io to collide. We don't consider this is an actual threat model, so we are not going to use any cryptographically secure hash algorithm like BLAKE3. See also <rust-lang#13171 (comment)>
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. It also helps cache registry index all at once for all platforms, for example the use case in rust-lang#14795 (despite they should use `cargo vendor` instead IMO). Some caveats: * Newer cargo will need to re-download files for global caches (index files, git/registry sources). The old cache is still kept and used when running with older cargoes. * Windows is not really covered by the "cross-platform" hash, because path prefix components like `C:` are always there. That means hashes of some sources kind, like local registry and local path, are not going to be real cross-platform stable. There might be hash collisions if you have two registries under the same domain. This won't happen to crates.io, as the infra would have to intentionally put another registry on index.crates.io to collide. We don't consider this is an actual threat model, so we are not going to use any cryptographically secure hash algorithm like BLAKE3. See also <rust-lang#13171 (comment)>
This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. It also helps cache registry index all at once for all platforms, for example the use case in rust-lang#14795 (despite they should use `cargo vendor` instead IMO). Some caveats: * Newer cargo will need to re-download files for global caches (index files, git/registry sources). The old cache is still kept and used when running with older cargoes. * Windows is not really covered by the "cross-platform" hash, because path prefix components like `C:` are always there. That means hashes of some sources kind, like local registry and local path, are not going to be real cross-platform stable. There might be hash collisions if you have two registries under the same domain. This won't happen to crates.io, as the infra would have to intentionally put another registry on index.crates.io to collide. We don't consider this is an actual threat model, so we are not going to use any cryptographically secure hash algorithm like BLAKE3. See also <rust-lang#13171 (comment)>
### What does this PR try to resolve? This helps `-Ztrim-paths` build a stable cross-platform path for the registry and git sources. Sources files then can be found from the same path when debugging. It also helps cache registry index all at once for all platforms, for example the use case in #14795 (despite they should use `cargo vendor` instead IMO). Some caveats: * Newer cargo will need to re-download files for global caches (index files, git/registry sources). The old cache is still kept and used when running with older cargoes. * Absolute paths on windows iarenot really covered by the "cross-platform" hash, because path prefix components like `C:` are always there. That means hashes of some sources kind, like local registry and local path, are not going to be real cross-platform stable. #### Security concern There might be hash collisions if you have two registries under the same domain. This won't happen to crates.io, as the infra would have to intentionally put another registry on index.crates.io to collide. We don't consider this is an actual threat model, so we are not going to use any cryptographically secure hash algorithm like BLAKE3. At least, the current unstable SipHash isn't in a better situation. We might switch to a cryptographic secure one when needed. See also <#13171 (comment)> ### How should we test and review this PR? We have an FCP in <#14795 (comment)>. This PR implements the proposal, The path-length concern in <#14795 (comment)> is automatically addressed because we don't need cryptographically secure hash for now. ### Additional information See more information and benchmark results in <#14116>.
This is part of #12137
As of 6982b44, the remap of
-Ztrim-paths
is relatively simple. It follows what is described in RFC 3127:In Cargo, that is:
cargo/src/cargo/core/compiler/mod.rs
Lines 1222 to 1230 in 6982b44
However, this is not ideal for debugger. Several questions emerge in rust-lang/rust#111540 (comment).
<pkg>-<version>/
back to absolute paths. It would require a dedicated tool to restore all the paths.<pkg>-<version>
as well.src/some/file.rs
.Possible options
Registry and git dependencies
Updated as suggested in #13171 (comment)
In rust-lang/rust#111540 (comment), @michaelwoerister brought up a smarter remap strategy. We could remap paths from
$CARGO_HOME/registry/src
and$CARGO_HOME/git/checkouts
to a empty string for registry and git dependencies respectively, so that there is a finite set of rule (two exactly) to configure in debugger for each platform. That is to say, cargo could unconditionally pass these to the compiler:--remap-path-prefix=$CARGO_HOME/.cargo/registry/src=
--remap-path-prefix=$CARGO_HOME/.cargo/git/checkouts=
(or conditionally if we want to reduce number of args passed in to rustc)
That is, example
To restore source paths for debuggers, you only need to set two paths. For example in gdb, set
Path dependencies
For path dependencies, I wonder if we can always remap to relative path to workspace root, regardless whether the package is inside or outside the workspace. Developers should be responsible for locals dependencies, and if they want, they are able to have their own remap rules via
--remap-path-prefix
.Other sources
Vendor and local-registry dependencies
They are both sort of local dependencies. I would suggest treating them as path dependencies. We can always apply new rules to them later on.
Patched dependencies
Just apply all the suggested remap rule above, if not sufficient, people can write their own rule via
--remap-path-prefix
.The text was updated successfully, but these errors were encountered: