-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ensure_sufficient_stack()
non-generic, using cargo-llvm-lines
#76680
Make ensure_sufficient_stack()
non-generic, using cargo-llvm-lines
#76680
Conversation
r? @estebank (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue I wish perf measured compile times for the rust compiler itself :( |
Awaiting bors try build completion |
⌛ Trying commit d744dc857ba132dfc9016717864aec7f5d2f2a24 with merge f242334c1e2e719cf1cba923923ad8ec62affb71... |
/// Measuring with cargo-llvm-lines revealed that `psm::on_stack::with_on_stack` was | ||
/// monomorphized 1552 times and was responsible for 1.5% of rustc's total llvm-lines. | ||
/// Making this wrapper without a generic bound removes all of that duplication. | ||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlining is done by llvm, right? Is inline(never)
necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It has no effect on llvm-lines, but I was afraid that it would all be inlined again and thus negating the benefit of this change.
Now thinking about it, that probably won't happen though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the #[inline(never)]
.
☀️ Try build successful - checks-actions, checks-azure |
Queued f242334c1e2e719cf1cba923923ad8ec62affb71 with parent 7402a39, future comparison URL. |
d744dc8
to
05f5cd5
Compare
Finished benchmarking try commit (f242334c1e2e719cf1cba923923ad8ec62affb71): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Oh wow! -2.5% on hello world, -1.5-2% on several other benchmarks. |
I'm curious to see if it would be improved by removing the |
It might be better to apply this change in stacker itself. It's internal |
@Marwes won't that prevent using |
No, because it would be the exact same trick as you do here, wrap the |
I'll try that too and we can compare it. And wow, I'm surprised this is a perf improvement ^^ |
I've now moved the conversion to a dyn closure into stacker. If this version turns out to be better, I will open a PR to stacker, and probably change this PR to be just a version bump. With that, can I have another perf run? |
Relevant commit: Julian-Wollersberger/stacker@71993c7 @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 2b1badd692a5f92acd49ac01bdccb0a92b0020ae with merge 3a9349835672afaadc58a1327d44e3453b34a415... |
No, Tidy doesn't. But bors is still going? I guess it doesn't run tidy? |
No, bors doesn't run tidy (otherwise you'd have to wait to run perf runs until the author ran x.py fmt, which is annoying). |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Some more places to look at after this:
Let me know if any of those sound interesting, I don't know how to help with all of them but I know who does ;) |
Hi! This PR showed up in the weekly perf triage report. It resulted in a moderate improvement in instruction counts (up to -3.2% on A really nice result @Julian-Wollersberger, especially since the main goal of this PR seems to have been to speed up Also, thank you for iterating on this, since the first version was indeed a small regression on most "real-world" benchmarks. @jyn514, here's an example of a perf run for a PR that doesn't affect perf (i.e. it's just noise). It's important to look past the first few numbers and see if there's a larger trend, especially if those results aren't for real-world benchmarks. |
Clean up some of the pkgsrc Makefile, there's still lots in here that should just be deleted though. Switch SunOS to the illumos bootstrap by default. Version 1.48.0 (2020-11-19) ========================== Language -------- - [The `unsafe` keyword is now syntactically permitted on modules.][75857] This is still rejected *semantically*, but can now be parsed by procedural macros. Compiler -------- - [Stabilised the `-C link-self-contained=<yes|no>` compiler flag.][76158] This tells `rustc` whether to link its own C runtime and libraries or to rely on a external linker to find them. (Supported only on `windows-gnu`, `linux-musl`, and `wasi` platforms.) - [You can now use `-C target-feature=+crt-static` on `linux-gnu` targets.][77386] Note: If you're using cargo you must explicitly pass the `--target` flag. - [Added tier 2\* support for `aarch64-unknown-linux-musl`.][76420] \* Refer to Rust's [platform support page][forge-platform-support] for more information on Rust's tiered platform support. Libraries --------- - [`io::Write` is now implemented for `&ChildStdin` `&Sink`, `&Stdout`, and `&Stderr`.][76275] - [All arrays of any length now implement `TryFrom<Vec<T>>`.][76310] - [The `matches!` macro now supports having a trailing comma.][74880] - [`Vec<A>` now implements `PartialEq<[B]>` where `A: PartialEq<B>`.][74194] - [The `RefCell::{replace, replace_with, clone}` methods now all use `#[track_caller]`.][77055] Stabilized APIs --------------- - [`slice::as_ptr_range`] - [`slice::as_mut_ptr_range`] - [`VecDeque::make_contiguous`] - [`future::pending`] - [`future::ready`] The following previously stable methods are now `const fn`'s: - [`Option::is_some`] - [`Option::is_none`] - [`Option::as_ref`] - [`Result::is_ok`] - [`Result::is_err`] - [`Result::as_ref`] - [`Ordering::reverse`] - [`Ordering::then`] Cargo ----- Rustdoc ------- - [You can now link to items in `rustdoc` using the intra-doc link syntax.][74430] E.g. ``/// Uses [`std::future`]`` will automatically generate a link to `std::future`'s documentation. See ["Linking to items by name"][intradoc-links] for more information. - [You can now specify `#[doc(alias = "<alias>")]` on items to add search aliases when searching through `rustdoc`'s UI.][75740] Compatibility Notes ------------------- - [Promotion of references to `'static` lifetime inside `const fn` now follows the same rules as inside a `fn` body.][75502] In particular, `&foo()` will not be promoted to `'static` lifetime any more inside `const fn`s. - [Associated type bindings on trait objects are now verified to meet the bounds declared on the trait when checking that they implement the trait.][27675] - [When trait bounds on associated types or opaque types are ambiguous, the compiler no longer makes an arbitrary choice on which bound to use.][54121] - [Fixed recursive nonterminals not being expanded in macros during pretty-print/reparse check.][77153] This may cause errors if your macro wasn't correctly handling recursive nonterminal tokens. - [`&mut` references to non zero-sized types are no longer promoted.][75585] - [`rustc` will now warn if you use attributes like `#[link_name]` or `#[cold]` in places where they have no effect.][73461] - [Updated `_mm256_extract_epi8` and `_mm256_extract_epi16` signatures in `arch::{x86, x86_64}` to return `i32` to match the vendor signatures.][73166] - [`mem::uninitialized` will now panic if any inner types inside a struct or enum disallow zero-initialization.][71274] - [`#[target_feature]` will now error if used in a place where it has no effect.][78143] - [Foreign exceptions are now caught by `catch_unwind` and will cause an abort.][70212] Note: This behaviour is not guaranteed and is still considered undefined behaviour, see the [`catch_unwind`] documentation for further information. Internal Only ------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of rustc and related tools. - [Building `rustc` from source now uses `ninja` by default over `make`.][74922] You can continue building with `make` by setting `ninja=false` in your `config.toml`. - [cg_llvm: `fewer_names` in `uncached_llvm_type`][76030] - [Made `ensure_sufficient_stack()` non-generic][76680] [78143]: rust-lang/rust#78143 [76680]: rust-lang/rust#76680 [76030]: rust-lang/rust#76030 [70212]: rust-lang/rust#70212 [27675]: rust-lang/rust#27675 [54121]: rust-lang/rust#54121 [71274]: rust-lang/rust#71274 [77386]: rust-lang/rust#77386 [77153]: rust-lang/rust#77153 [77055]: rust-lang/rust#77055 [76275]: rust-lang/rust#76275 [76310]: rust-lang/rust#76310 [76420]: rust-lang/rust#76420 [76158]: rust-lang/rust#76158 [75857]: rust-lang/rust#75857 [75585]: rust-lang/rust#75585 [75740]: rust-lang/rust#75740 [75502]: rust-lang/rust#75502 [74880]: rust-lang/rust#74880 [74922]: rust-lang/rust#74922 [74430]: rust-lang/rust#74430 [74194]: rust-lang/rust#74194 [73461]: rust-lang/rust#73461 [73166]: rust-lang/rust#73166 [intradoc-links]: https://doc.rust-lang.org/rustdoc/linking-to-items-by-name.html [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html [`Option::is_some`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.is_some [`Option::is_none`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.is_none [`Option::as_ref`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.as_ref [`Result::is_ok`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.is_ok [`Result::is_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.is_err [`Result::as_ref`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.as_ref [`Ordering::reverse`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.reverse [`Ordering::then`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.then [`slice::as_ptr_range`]: https://doc.rust-lang.org/std/primitive.slice.html#method.as_ptr_range [`slice::as_mut_ptr_range`]: https://doc.rust-lang.org/std/primitive.slice.html#method.as_mut_ptr_range [`VecDeque::make_contiguous`]: https://doc.rust-lang.org/std/collections/struct.VecDeque.html#method.make_contiguous [`future::pending`]: https://doc.rust-lang.org/std/future/fn.pending.html [`future::ready`]: https://doc.rust-lang.org/std/future/fn.ready.html
Pkgsrc changes: * Compensate for files being moved around upstream. * Introduce optional, on-by-default semi-static building of cargo, using the internal curl and openssl sources. This reduces the dynamic dependencies of cargo and therefore the rust package itself. Ref. options.mk. * The 1.47.0 bootstrap kits have been re-built with the above option turned on, so no longer depends on curl or openssl from pkgsrc and/or from earlier OS or pkgsrc versions. This should hopefully fix installation of rust with non-default PREFIX, ref. PR#54453. Upstream changes: Version 1.48.0 (2020-11-19) ========================== Language -------- - [The `unsafe` keyword is now syntactically permitted on modules.][75857] This is still rejected *semantically*, but can now be parsed by procedural macros. Compiler -------- - [Stabilised the `-C link-self-contained=<yes|no>` compiler flag.][76158] This tells `rustc` whether to link its own C runtime and libraries or to rely on a external linker to find them. (Supported only on `windows-gnu`, `linux-musl`, and `wasi` platforms.) - [You can now use `-C target-feature=+crt-static` on `linux-gnu` targets.] [77386] Note: If you're using cargo you must explicitly pass the `--target` flag. - [Added tier 2\* support for `aarch64-unknown-linux-musl`.][76420] \* Refer to Rust's [platform support page][forge-platform-support] for more information on Rust's tiered platform support. Libraries --------- - [`io::Write` is now implemented for `&ChildStdin` `&Sink`, `&Stdout`, and `&Stderr`.][76275] - [All arrays of any length now implement `TryFrom<Vec<T>>`.][76310] - [The `matches!` macro now supports having a trailing comma.][74880] - [`Vec<A>` now implements `PartialEq<[B]>` where `A: PartialEq<B>`.][74194] - [The `RefCell::{replace, replace_with, clone}` methods now all use `#[track_caller]`.][77055] Stabilized APIs --------------- - [`slice::as_ptr_range`] - [`slice::as_mut_ptr_range`] - [`VecDeque::make_contiguous`] - [`future::pending`] - [`future::ready`] The following previously stable methods are now `const fn`'s: - [`Option::is_some`] - [`Option::is_none`] - [`Option::as_ref`] - [`Result::is_ok`] - [`Result::is_err`] - [`Result::as_ref`] - [`Ordering::reverse`] - [`Ordering::then`] Cargo ----- Rustdoc ------- - [You can now link to items in `rustdoc` using the intra-doc link syntax.][74430] E.g. ``/// Uses [`std::future`]`` will automatically generate a link to `std::future`'s documentation. See ["Linking to items by name"][intradoc-links] for more information. - [You can now specify `#[doc(alias = "<alias>")]` on items to add search aliases when searching through `rustdoc`'s UI.][75740] Compatibility Notes ------------------- - [Promotion of references to `'static` lifetime inside `const fn` now follows the same rules as inside a `fn` body.][75502] In particular, `&foo()` will not be promoted to `'static` lifetime any more inside `const fn`s. - [Associated type bindings on trait objects are now verified to meet the bounds declared on the trait when checking that they implement the trait.][27675] - [When trait bounds on associated types or opaque types are ambiguous, the compiler no longer makes an arbitrary choice on which bound to use.][54121] - [Fixed recursive nonterminals not being expanded in macros during pretty-print/reparse check.][77153] This may cause errors if your macro wasn't correctly handling recursive nonterminal tokens. - [`&mut` references to non zero-sized types are no longer promoted.][75585] - [`rustc` will now warn if you use attributes like `#[link_name]` or `#[cold]` in places where they have no effect.][73461] - [Updated `_mm256_extract_epi8` and `_mm256_extract_epi16` signatures in `arch::{x86, x86_64}` to return `i32` to match the vendor signatures.][73166] - [`mem::uninitialized` will now panic if any inner types inside a struct or enum disallow zero-initialization.][71274] - [`#[target_feature]` will now error if used in a place where it has no effect.][78143] - [Foreign exceptions are now caught by `catch_unwind` and will cause an abort.][70212] Note: This behaviour is not guaranteed and is still considered undefined behaviour, see the [`catch_unwind`] documentation for further information. Internal Only ------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of rustc and related tools. - [Building `rustc` from source now uses `ninja` by default over `make`.][74922] You can continue building with `make` by setting `ninja=false` in your `config.toml`. - [cg_llvm: `fewer_names` in `uncached_llvm_type`][76030] - [Made `ensure_sufficient_stack()` non-generic][76680] [78143]: rust-lang/rust#78143 [76680]: rust-lang/rust#76680 [76030]: rust-lang/rust#76030 [70212]: rust-lang/rust#70212 [27675]: rust-lang/rust#27675 [54121]: rust-lang/rust#54121 [71274]: rust-lang/rust#71274 [77386]: rust-lang/rust#77386 [77153]: rust-lang/rust#77153 [77055]: rust-lang/rust#77055 [76275]: rust-lang/rust#76275 [76310]: rust-lang/rust#76310 [76420]: rust-lang/rust#76420 [76158]: rust-lang/rust#76158 [75857]: rust-lang/rust#75857 [75585]: rust-lang/rust#75585 [75740]: rust-lang/rust#75740 [75502]: rust-lang/rust#75502 [74880]: rust-lang/rust#74880 [74922]: rust-lang/rust#74922 [74430]: rust-lang/rust#74430 [74194]: rust-lang/rust#74194 [73461]: rust-lang/rust#73461 [73166]: rust-lang/rust#73166 [intradoc-links]: https://doc.rust-lang.org/rustdoc/linking-to-items-by-name.html [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html [`Option::is_some`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.is_some [`Option::is_none`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.is_none [`Option::as_ref`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.as_ref [`Result::is_ok`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.is_ok [`Result::is_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.is_err [`Result::as_ref`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.as_ref [`Ordering::reverse`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.reverse [`Ordering::then`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.then [`slice::as_ptr_range`]: https://doc.rust-lang.org/std/primitive.slice.html#method.as_ptr_range [`slice::as_mut_ptr_range`]: https://doc.rust-lang.org/std/primitive.slice.html#method.as_mut_ptr_range [`VecDeque::make_contiguous`]: https://doc.rust-lang.org/std/collections/struct.VecDeque.html#method.make_contiguous [`future::pending`]: https://doc.rust-lang.org/std/future/fn.pending.html [`future::ready`]: https://doc.rust-lang.org/std/future/fn.ready.html
Includes rust-lang/hashbrown#204 and rust-lang/hashbrown#205 (not yet merged) which both server to reduce the amount of IR generated for hashmaps. Inspired by the llvm-lines data gathered in rust-lang#76680
feat: Update hashbrown to instantiate less llvm IR Includes rust-lang/hashbrown#204 and rust-lang/hashbrown#205 (not yet merged) which both serve to reduce the amount of IR generated for hashmaps. Inspired by the llvm-lines data gathered in rust-lang#76680 (cc `@Julian-Wollersberger)`
Inspired by this blog post from @nnethercote, I used cargo-llvm-lines on the rust compiler itself, to improve it's compile time. This PR contains only one low-hanging fruit, but I also want to share some measurements.
The function
ensure_sufficient_stack()
was monomorphized 1500 times, and with it thestacker
andpsm
crates, for a total of 1.5% of all llvm IR lines. With some trickery I convert the generic closure into a dynamic one, and thus all that code is only monomorphized once.Measurements
Getting these numbers took some fiddling with CLI flags and I modified cargo-llvm-lines to read from a folder instead of invoking cargo. Commands I used:
The result is this list (see first 500 lines ), before the change:
All
.ll
files together had 4.4GB. After my change they had 4.2GB. So a few percent less code LLVM has to process. Hurray!Sadly, I couldn't measure an actual wall-time improvement. Watching YouTube while compiling added to much noise...
Here is the top of the list after the change:
Note that the total was reduced by 430 000 lines and
psm::on_stack::with_on_stack
has disappeared. Insteadrustc_data_structures::stack::ensure_sufficient_stack::{{closure}}
appeared. I'm confused about that one, but it seems to consist of inlined calls torustc_query_system::*
stuff.Further note the other two big culprits in this list:
rustc_query_system
andhashbrown
. These two are monomorphized many times, the query system summing to more than 20% of all lines, not even counting code that's probably inlined elsewhere.Assuming compile times scale linearly with llvm-lines, that means a possible 20% compile time reduction.
Reducing eg.
get_query_impl
would probably need a major refactoring of the qery system though. Everything in there is generic over multiple types, has associated types and passes generic Self arguments by value. Which means you can't simply make thingsdyn
.This PR is a small step to make rustc compile faster and thus make contributing to rustc less painful. Nonetheless I love Rust and I find the work around rustc fascinating :)