Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved c-tiny benchmark #5948

Merged
merged 6 commits into from
Dec 31, 2024
Merged

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Dec 25, 2024

Progress on #5945

This does not fully enact the vision in #5945, but it does start testing various combinations of tactics. I didn't want to test all combinations and I didn't want to spend time figuring out which combinations we really want, so I went ahead and tested the ones most relevant to the current investigation (#5935).

This shows the effects of LTO, linker-plugin-lto, and stripping+gc-sections on panic-abort (with panic-immediate-abort std) release Rust builds.

Benchmarks with just panic=abort but no panic-immediate-abort/panic-abort std would probably be useful to clients but I haven't added them now. It may be worth using makefile magic to truly build a matrix of targets.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 25, 2024

The current numbers are:

-rwxr-x--- 1 manishearth primarygroup   47416 Dec 25 12:03 panic-abort-clang.elf
-rwxr-x--- 1 manishearth primarygroup   47416 Dec 25 12:03 panic-abort-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 25 12:09 panic-abort-linker-plugin-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 25 12:08 panic-abort-linker-plugin-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  140400 Dec 25 12:03 panic-abort-lto-clang.elf
-rwxr-x--- 1 manishearth primarygroup   43320 Dec 25 12:03 panic-abort-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  133520 Dec 25 12:03 panic-abort-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   34040 Dec 25 12:03 panic-abort-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 5273536 Dec 25 11:53 release.elf
-rwxr-x--- 1 manishearth primarygroup  387664 Dec 25 11:46 release-gcc-stripped.elf

An important wrinkle: LTO bloats binaries unless paired with stripping + gc-sections.

@sffc
Copy link
Member

sffc commented Dec 26, 2024

Praise: this is a good change to make. I'm glad there's someone else who has gone deep into this test.

Question: I don't know if I fully understand the difference between "one step" LTO and "two step" LTO. What I think it means: "one step" runs LTO on the C code and Rust code separately, whereas "two step" runs LTO on both at the same time.

Question: even in "one step", you have Rust emitting bitcode. Is this necessary? And I think this is one of the things that causes LLVM version mismatches to be problematic?

@sffc
Copy link
Member

sffc commented Dec 26, 2024

Observation: you found that "panic-abort-linker-plugin-lto-clang-twostep.elf" and "panic-abort-linker-plugin-lto-clang-twostep-stripped.elf" are the same size, but not one-step. My expectation is that this is because in "two-step", LTO is already running on the whole binary, so there is nothing left to strip, whereas in "one-step", there is no dead code elimination until you turn on braindead gc-sections.

@sffc
Copy link
Member

sffc commented Dec 26, 2024

In other words, my understanding has been that the thing you're calling "two-step LTO", aka "linker-plugin-lto", aka the thing I've always just called simply "LTO", applies -Os optimizations even at link time, so it can do things like inline function calls between Rust and C and as a result juice a bit extra code size and performance.

In my mental model, the thing you're calling "one-step LTO" to me is "LTO separately in C and Rust, with standard non-LTO cross-language linking". I agree that this does not depend on LLVM version between C and Rust since there is no LLVM bitcode or metadata that needs to be shared.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 26, 2024

Question: even in "one step", you have Rust emitting bitcode. Is this necessary? And I think this is one of the things that causes LLVM version mismatches to be problematic?

No, pure rust LTO still requires that flag (it refuses to even initiate compilation without it, try it!)

Remember, Rust is still linking together a giant pile of rlibs! Each rlib does contain some machine code, so if Rust wishes to do proper LTO it needs to include LLVM bitcode in its rlibs alongside the machine code.

Why that's not implied I don't know. Cargo's LTO setting implies the other flag. I suppose they wanted explicitness in the lower level CLI. 🤷

I considered using Cargo profiles to do the LTO toggle but didn't want to make too many changes. That would have the nice benefit of not needing --target-dir hacks since I can use --profile.


linker-plugin-lto is documented as the thing needing LLVM concordance. No other mention of LTO in Rust has that requirement, and it matches my previous testing. I can do some more testing with the new makefile to verify.

"two-step LTO", aka "linker-plugin-lto", aka the thing I've always just called simply "LTO"

I don't think the "two step" and "linker-plugin-lto" are the same thing (though I suspect linker-plugin-lto needs two-step to work?), and I have some theories as to why the numbers are different (largely to do with ThinLTO or LLD) but I haven't yet investigated.

Please let's not make our terminology "two step aka linker-plugin-lto", these test results clearly show them as having different results. two-step has an impact even without linker-plugin-lto.

I mostly retained this distinction from the previous test but I haven't dug deeper into it. It's possible it's just a matter of the precise combinations of flags rather than the phasing (in which case we should benchmark those flags independently), or it's deficiencies in the ThinLTO model (in which case we should just try with regular LTO).

From a quick glance, it could literally just be the distinction between using LLD and the default system linker. I didn't poke too deeply, my focus was on linker-plugin-lto whilst trying not to lose any innovations in the old benchmark.

@Manishearth
Copy link
Member Author

Confirmed: only the two linker-plugin-lto targets are the ones which need linked LLVM versions. Proven by switching makefile to clang and LLD 15 and rebuilding the elf/o files.

@Manishearth
Copy link
Member Author

Confirmed: two-step vs one-step is actually just "LLD vs no LLD"

@Manishearth
Copy link
Member Author

-rwxr-x--- 1 manishearth primarygroup 1178184 Dec 26 10:02 panic-abort-clang.elf
-rwxr-x--- 1 manishearth primarygroup   47416 Dec 25 12:23 panic-abort-clang-gc-sections.elf
-rwxr-x--- 1 manishearth primarygroup   47416 Dec 26 10:02 panic-abort-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 26 10:06 panic-abort-linker-plugin-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 26 10:02 panic-abort-linker-plugin-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 26 10:02 panic-abort-linker-plugin-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  133520 Dec 26 10:02 panic-abort-lto-clang.elf
-rwxr-x--- 1 manishearth primarygroup   34040 Dec 26 10:02 panic-abort-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  133520 Dec 26 10:02 panic-abort-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   34040 Dec 26 10:02 panic-abort-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 5273536 Dec 26 10:02 release.elf
-rwxr-x--- 1 manishearth primarygroup  387664 Dec 26 10:02 release-gcc-stripped.elf

That's d21a723

I'm going to keep that commit around but rename the tests, now that we know that the phasing is irrelevant. I may still test "LLD vs not LLD", but really if we're testing LTO with clang we should always be using LLD so I don't think we need to.

@Manishearth
Copy link
Member Author

Some results of the discussion with @sffc:

  • I still need to improve the benchmark: remove twophase/singlephase stuff, potentially benchmark "full" LTO
  • This benchmark should also be copied over to the segmenter folder. Datetime may be an interesting additional benchmark to add
  • c-tiny should get a README / tutorial file that explains to users how to build a size-optimized ICU4X, with a breakdown of the performance numbers and of the various options available, including their tradeoffs (needs nightly Rust, needs build-std, needs concordant LLVM versions, needs clang-not-GCC, kills debug symbols, etc)

@Manishearth
Copy link
Member Author

Segmenter numbers, hot off the presses

-rwxr-x--- 1 manishearth primarygroup 5681200 Dec 27 14:03 panic-abort-clang.elf
-rwxr-x--- 1 manishearth primarygroup  383280 Dec 27 14:03 panic-abort-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  372136 Dec 27 14:09 panic-abort-linker-plugin-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 4318128 Dec 27 14:04 panic-abort-lto-clang.elf
-rwxr-x--- 1 manishearth primarygroup  372744 Dec 27 14:04 panic-abort-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 9601072 Dec 27 14:02 release.elf
-rwxr-x--- 1 manishearth primarygroup  719432 Dec 27 14:02 release-gcc-stripped.elf

linker-plugin-lto brings us from 372744 to 372136, an extremely small win. My expectation is that the wins will be diminishing as the programs get larger.

I couldn't get thin vs fat LTO in either clang or rustc to produce interesting differences. It's worth investigating more, but this is enough to convince me to ignore thin vs fat LTO for the benchmark for now, and maybe look into it later.

@Manishearth Manishearth marked this pull request as ready for review December 27, 2024 22:13
@Manishearth Manishearth requested a review from a team as a code owner December 27, 2024 22:13
@Manishearth
Copy link
Member Author

I added a basic README that explains what's going on with numbers

@sffc
Copy link
Member

sffc commented Dec 28, 2024

Would still like to see performance numbers due to cross-language inlining, which I expect to be at least a few percentage points if measuring tight loops in segmenter or normalizer, but I'm reasonably convinced that binary size impact is negligible enough that we don't need to hold up a 2.0 MSRV upgrade on this.

@sffc
Copy link
Member

sffc commented Dec 28, 2024

Suggestion: I think the ones I would include would be

In PR? A/B C settings Rust settings Comment
1 A gcc release
2 A gcc, gc-sections, strip-all release
A gcc, no special flags panic-abort-lto
B gcc, gc-sections panic-abort-lto
B gcc, strip-all panic-abort-lto
A gcc, gc-sections, strip-all panic-abort-lto Best with GCC?
B clang release
B clang, lto, strip-all release
A clang lto
A clang, lto, strip-all lto Best with stable Rust + decoupled LLVM?
B clang panic-abort-lto
4 B clang, lto panic-abort-lto
B clang, strip-all panic-abort-lto
B clang, gc-sections panic-abort-lto
~6 A clang, lto, strip-all panic-abort-lto gc-sections not needed with LTO?
5 B clang, gc-sections, strip-all panic-abort-lto
7 A clang, lto, strip-all panic-abort-lto-plugin Best possible result?

There's a lot in that table, so I labeled the ones I feel are the "most important" with A and the rest with B.

Tests I dropped:

3 -- not much reason to test panic-abort without Rust lto? If you're going through the trouble of panic-abort, you should be using lto in Rust.

There are 4 Rust configurations required here, all building on the previous:

  1. release is the standard release mode
  2. lto is release mode but with LTO, -Os, stripping, codegen-units, and all other stable Rust binary size features. Maybe also include panic = "abort" (non-std version) in this?
  3. panic-abort-lto is all of the above plus the Nightly Rust binary size features including rebuilding the Rust standard library with panic_immediate_abort.
  4. panic-abort-lto-plugin is all of the above plus -Clinker-plugin-lto.

Thoughts?

@Manishearth Manishearth requested a review from sffc December 29, 2024 23:48
@Manishearth
Copy link
Member Author

3 -- not much reason to test panic-abort without Rust lto? If you're going through the trouble of panic-abort, you should be using lto in Rust.

I included it mostly as a "here's the general style of impact you can get with this change" test, but yeah perhaps not useful.

The A set definitely seems good. We should play around with gc-sections more at some point. B isn't a bad idea to include , too, especially if we can write the makefile more robustly using makefile functions and such. I don't plan to do that here, though: I suggest we land benchmarks 1-7 and keep the other issue open for getting benchmark set A/B in.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have the targets recommended in my table but this is a really good step forward

@Manishearth Manishearth merged commit 9486659 into unicode-org:main Dec 31, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants