-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved c-tiny benchmark #5948
Conversation
The current numbers are:
An important wrinkle: LTO bloats binaries unless paired with stripping + gc-sections. |
f34fcbd
to
e98f0ce
Compare
e98f0ce
to
990c948
Compare
Praise: this is a good change to make. I'm glad there's someone else who has gone deep into this test. Question: I don't know if I fully understand the difference between "one step" LTO and "two step" LTO. What I think it means: "one step" runs LTO on the C code and Rust code separately, whereas "two step" runs LTO on both at the same time. Question: even in "one step", you have Rust emitting bitcode. Is this necessary? And I think this is one of the things that causes LLVM version mismatches to be problematic? |
Observation: you found that "panic-abort-linker-plugin-lto-clang-twostep.elf" and "panic-abort-linker-plugin-lto-clang-twostep-stripped.elf" are the same size, but not one-step. My expectation is that this is because in "two-step", LTO is already running on the whole binary, so there is nothing left to strip, whereas in "one-step", there is no dead code elimination until you turn on braindead gc-sections. |
In other words, my understanding has been that the thing you're calling "two-step LTO", aka "linker-plugin-lto", aka the thing I've always just called simply "LTO", applies -Os optimizations even at link time, so it can do things like inline function calls between Rust and C and as a result juice a bit extra code size and performance. In my mental model, the thing you're calling "one-step LTO" to me is "LTO separately in C and Rust, with standard non-LTO cross-language linking". I agree that this does not depend on LLVM version between C and Rust since there is no LLVM bitcode or metadata that needs to be shared. |
No, pure rust LTO still requires that flag (it refuses to even initiate compilation without it, try it!) Remember, Rust is still linking together a giant pile of rlibs! Each rlib does contain some machine code, so if Rust wishes to do proper LTO it needs to include LLVM bitcode in its rlibs alongside the machine code. Why that's not implied I don't know. Cargo's LTO setting implies the other flag. I suppose they wanted explicitness in the lower level CLI. 🤷 I considered using Cargo profiles to do the LTO toggle but didn't want to make too many changes. That would have the nice benefit of not needing
I don't think the "two step" and "linker-plugin-lto" are the same thing (though I suspect linker-plugin-lto needs two-step to work?), and I have some theories as to why the numbers are different (largely to do with ThinLTO or LLD) but I haven't yet investigated. Please let's not make our terminology "two step aka linker-plugin-lto", these test results clearly show them as having different results. two-step has an impact even without linker-plugin-lto. I mostly retained this distinction from the previous test but I haven't dug deeper into it. It's possible it's just a matter of the precise combinations of flags rather than the phasing (in which case we should benchmark those flags independently), or it's deficiencies in the ThinLTO model (in which case we should just try with regular LTO). From a quick glance, it could literally just be the distinction between using LLD and the default system linker. I didn't poke too deeply, my focus was on linker-plugin-lto whilst trying not to lose any innovations in the old benchmark. |
Confirmed: only the two linker-plugin-lto targets are the ones which need linked LLVM versions. Proven by switching makefile to clang and LLD 15 and rebuilding the elf/o files. |
Confirmed: two-step vs one-step is actually just "LLD vs no LLD" |
That's d21a723 I'm going to keep that commit around but rename the tests, now that we know that the phasing is irrelevant. I may still test "LLD vs not LLD", but really if we're testing LTO with clang we should always be using LLD so I don't think we need to. |
Some results of the discussion with @sffc:
|
Segmenter numbers, hot off the presses
I couldn't get thin vs fat LTO in either clang or rustc to produce interesting differences. It's worth investigating more, but this is enough to convince me to ignore thin vs fat LTO for the benchmark for now, and maybe look into it later. |
I added a basic README that explains what's going on with numbers |
Would still like to see performance numbers due to cross-language inlining, which I expect to be at least a few percentage points if measuring tight loops in segmenter or normalizer, but I'm reasonably convinced that binary size impact is negligible enough that we don't need to hold up a 2.0 MSRV upgrade on this. |
Suggestion: I think the ones I would include would be
There's a lot in that table, so I labeled the ones I feel are the "most important" with A and the rest with B. Tests I dropped: 3 -- not much reason to test panic-abort without Rust lto? If you're going through the trouble of panic-abort, you should be using lto in Rust. There are 4 Rust configurations required here, all building on the previous:
Thoughts? |
I included it mostly as a "here's the general style of impact you can get with this change" test, but yeah perhaps not useful. The A set definitely seems good. We should play around with gc-sections more at some point. B isn't a bad idea to include , too, especially if we can write the makefile more robustly using makefile functions and such. I don't plan to do that here, though: I suggest we land benchmarks 1-7 and keep the other issue open for getting benchmark set A/B in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have the targets recommended in my table but this is a really good step forward
Progress on #5945
This does not fully enact the vision in #5945, but it does start testing various combinations of tactics. I didn't want to test all combinations and I didn't want to spend time figuring out which combinations we really want, so I went ahead and tested the ones most relevant to the current investigation (#5935).
This shows the effects of LTO, linker-plugin-lto, and stripping+gc-sections on panic-abort (with panic-immediate-abort std) release Rust builds.
Benchmarks with just panic=abort but no panic-immediate-abort/panic-abort std would probably be useful to clients but I haven't added them now. It may be worth using makefile magic to truly build a matrix of targets.