-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some DCE issues with ThinLTO leaving in construction of unused parameters #47409
Comments
Further testing shows that building with
Edit: Disregard above statement -B |
Thanks for the report @gamozolabs! This sounds like it does indeed get blamed to ThinLTO! The previous form of LTO threw everything into one giant module which allows LLVM to see as much as possible and optimize as aggressively as possible. Typically this isn't necessary for applications, but as you're seeing there's still some that require the old form of LTO! I've tagged this as a regression from stable to beta for now to ensure we can deal with this before the next release. There's a few things going on here of note.
So given that I think there's two primary questions to consider when evaluating this:
Curious what others from @rust-lang/compiler might think. |
We could make |
I don't feel like the distinction should matter in general, but rather this is more of a LLVM limitation. |
Aside: It may be possible to improve ThinLTO here. My theory is that panic_fmt is not internalized because there are multiple modules, and thus deadargelim doesn't fire. Other interprocedural optimizations have similar problem and in those cases adding data to the module summary and teaching those optimizations about that has helped. For deadargelim, you probably can't eliminate the arguments (because you can't update all the callers unilaterally), but you could store the fact that an argument is unused in the module summary and callers who are aware of this can pass undef instead. |
I think we should provide a way to get full LTO on stable/beta as it still provides better runtime performance overall. |
@Mark-Simulacrum your suggestion sounds great to me! We could do what we did with debuginfo and accept |
I like @Mark-Simulacrum's suggestion. I guess an open question is whether the default should be LTO or not -- but I think we can probably change the default to LTO if we think that's better. I'd be curious to hear from @gamozolabs if they agree. |
Ok I've opened #47521 to implement @Mark-Simulacrum's suggestion, and if we decide to merge I can follow up with a I'm sort of ambivalent about which form of LTO should be the default when specifying only |
I'm fine with defaulting to ThinLTO as long as full LTO is available on stable. |
triage: P-high This should get into beta, but there is a pending PR #47521 |
This commit reverts a small portion of the switch to ThinLTO by default which changed the meaning of `-C lto` from "put the whole crate graph into one codegen unit" to "perform ThinLTO over the whole crate graph". This backport has no corresponding commit on master as rust-lang#47521 is making the same change but in a slightly different manner. This commit is intended to be a surgical change with low impact on beta. Closes rust-lang#47409
[beta] Turn back on "fat" LTO by default This commit reverts a small portion of the switch to ThinLTO by default which changed the meaning of `-C lto` from "put the whole crate graph into one codegen unit" to "perform ThinLTO over the whole crate graph". This backport has no corresponding commit on master as #47521 is making the same change but in a slightly different manner. This commit is intended to be a surgical change with low impact on beta. Closes #47409
I think this is closed now with #47548 merged |
This commit primarily adds the ability to control what kind of LTO happens when rustc performs LTO, namely allowing values to be specified to the `-C lto` option, such as `-C lto=thin` and `-C lto=fat`. (where "fat" is the previous kind of LTO, throw everything in one giant module) Along the way this also refactors a number of fields which store information about whether LTO/ThinLTO are enabled to unify them all into one field through which everything is dispatched, hopefully removing a number of special cases throughout. This is intended to help mitigate rust-lang#47409 but will require a backport as well, and this would unfortunately need to be an otherwise insta-stable option.
Since ThinLTO became default in Rust my bootloader went from 17 KiB to about 42 KiB in size, which puts it over the 32 KiB limit. To get my bootloader to fit in this limit I've been able to strip out all of
core::fmt
by not having any parameters to mypanic_fmt
routine and this allows LTO to effectively delete all construction of thecore::fmt
parameters topanic_fmt
and thus save huge amounts of code.Since ThinLTO became default it seems that certain functions using volatile memory or inline assembly struggle to be optimized fully, however this is all I've found that replicates the issue, perhaps more common use of Rust could also trigger it.
For the tiny test case (included at the end of this issue) historically (from at least 2016 to November 2017) produced a .text section of ~100 bytes, and a total virtual program size of ~500 bytes. Since ThinLTO became default this tiny program grows to ~2400 bytes of .text and ~3200 bytes overall.
While this 2 KiB increase seems small, keep in mind it is for an application that is trivial and only has one thing using
core::fmt
(the bounds check panic on the array index). When a program grows and more panic come about these changes to be a much larger increase, in my case over 20 KiB.While I'm writing an RFC to address this issue more permanently by proposing an optional (developer opt-in) panic routine which does not rely on
core::fmt
(probably just a&'static str
for the filename andu32
for the line for bare bones debugging where nothing more can be afforded), I wanted to open this issue just in case it's a more fundamental optimization issue that could be fixed.The
core::intrinsics::abort()
is critical in thepanic_fmt
for the code size increase to occur. Other things that work here as well are statements likeasm!("nop")
or really any assembly statement. Further, assembly statements that are included in functions that are called frompanic_fmt
also cause the optimization problem to occur. This makes workarounds impossible in my case as my panic dumps to a serial port which ultimately requires inline assembly to access. So for my bootloader I have resorted to using an old version of Rust.What I find particularly strange is that while I've defined my
panic_fmt
to not take any parameters this doesn't seem to be a big enough hint to Rust/LLVM that they are not used and thus all construction of these parameters can be DCE'd.For a size listing of this test program using all nightly builds of Rust in the past 1.5 years see: https://gist.github.com/gamozolabs/d72c05e2b200f02397f1495cf33c2674
We can see from this that the size increase occured between
rustc 1.24.0-nightly (f9b0897c5 2017-12-02)
andrustc 1.24.0-nightly (1956d5535 2017-12-03)
which happens to be when ThinLTO went default.I've tried a few different tests using various
panic_fmt
declarations, with parameters, without, as extern, as pub, etc, and they seem to all be affected by this issue.Here's an example basic piece of code to be built with
nightly-x86_64-pc-windows-msvc
using args-C lto -C panic=abort -O
:Here is the disassembly difference between this same code but one with ThinLTO and one with full LTO: https://gist.github.com/gamozolabs/41b2059a55d5e0b60d7251dd8a31c055
The text was updated successfully, but these errors were encountered: