Improved c-tiny benchmark #5948

Manishearth · 2024-12-25T20:11:13Z

Progress on #5945

This does not fully enact the vision in #5945, but it does start testing various combinations of tactics. I didn't want to test all combinations and I didn't want to spend time figuring out which combinations we really want, so I went ahead and tested the ones most relevant to the current investigation (#5935).

This shows the effects of LTO, linker-plugin-lto, and stripping+gc-sections on panic-abort (with panic-immediate-abort std) release Rust builds.

Benchmarks with just panic=abort but no panic-immediate-abort/panic-abort std would probably be useful to clients but I haven't added them now. It may be worth using makefile magic to truly build a matrix of targets.

Manishearth · 2024-12-25T20:11:32Z

The current numbers are:

-rwxr-x--- 1 manishearth primarygroup   47416 Dec 25 12:03 panic-abort-clang.elf
-rwxr-x--- 1 manishearth primarygroup   47416 Dec 25 12:03 panic-abort-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 25 12:09 panic-abort-linker-plugin-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 25 12:08 panic-abort-linker-plugin-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  140400 Dec 25 12:03 panic-abort-lto-clang.elf
-rwxr-x--- 1 manishearth primarygroup   43320 Dec 25 12:03 panic-abort-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  133520 Dec 25 12:03 panic-abort-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   34040 Dec 25 12:03 panic-abort-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 5273536 Dec 25 11:53 release.elf
-rwxr-x--- 1 manishearth primarygroup  387664 Dec 25 11:46 release-gcc-stripped.elf

An important wrinkle: LTO bloats binaries unless paired with stripping + gc-sections.

sffc · 2024-12-26T05:11:14Z

Praise: this is a good change to make. I'm glad there's someone else who has gone deep into this test.

Question: I don't know if I fully understand the difference between "one step" LTO and "two step" LTO. What I think it means: "one step" runs LTO on the C code and Rust code separately, whereas "two step" runs LTO on both at the same time.

Question: even in "one step", you have Rust emitting bitcode. Is this necessary? And I think this is one of the things that causes LLVM version mismatches to be problematic?

sffc · 2024-12-26T05:15:19Z

Observation: you found that "panic-abort-linker-plugin-lto-clang-twostep.elf" and "panic-abort-linker-plugin-lto-clang-twostep-stripped.elf" are the same size, but not one-step. My expectation is that this is because in "two-step", LTO is already running on the whole binary, so there is nothing left to strip, whereas in "one-step", there is no dead code elimination until you turn on braindead gc-sections.

sffc · 2024-12-26T05:24:14Z

In other words, my understanding has been that the thing you're calling "two-step LTO", aka "linker-plugin-lto", aka the thing I've always just called simply "LTO", applies -Os optimizations even at link time, so it can do things like inline function calls between Rust and C and as a result juice a bit extra code size and performance.

In my mental model, the thing you're calling "one-step LTO" to me is "LTO separately in C and Rust, with standard non-LTO cross-language linking". I agree that this does not depend on LLVM version between C and Rust since there is no LLVM bitcode or metadata that needs to be shared.

Manishearth · 2024-12-26T09:20:40Z

Question: even in "one step", you have Rust emitting bitcode. Is this necessary? And I think this is one of the things that causes LLVM version mismatches to be problematic?

No, pure rust LTO still requires that flag (it refuses to even initiate compilation without it, try it!)

Remember, Rust is still linking together a giant pile of rlibs! Each rlib does contain some machine code, so if Rust wishes to do proper LTO it needs to include LLVM bitcode in its rlibs alongside the machine code.

Why that's not implied I don't know. Cargo's LTO setting implies the other flag. I suppose they wanted explicitness in the lower level CLI. 🤷

I considered using Cargo profiles to do the LTO toggle but didn't want to make too many changes. That would have the nice benefit of not needing --target-dir hacks since I can use --profile.

linker-plugin-lto is documented as the thing needing LLVM concordance. No other mention of LTO in Rust has that requirement, and it matches my previous testing. I can do some more testing with the new makefile to verify.

"two-step LTO", aka "linker-plugin-lto", aka the thing I've always just called simply "LTO"

I don't think the "two step" and "linker-plugin-lto" are the same thing (though I suspect linker-plugin-lto needs two-step to work?), and I have some theories as to why the numbers are different (largely to do with ThinLTO or LLD) but I haven't yet investigated.

Please let's not make our terminology "two step aka linker-plugin-lto", these test results clearly show them as having different results. two-step has an impact even without linker-plugin-lto.

I mostly retained this distinction from the previous test but I haven't dug deeper into it. It's possible it's just a matter of the precise combinations of flags rather than the phasing (in which case we should benchmark those flags independently), or it's deficiencies in the ThinLTO model (in which case we should just try with regular LTO).

From a quick glance, it could literally just be the distinction between using LLD and the default system linker. I didn't poke too deeply, my focus was on linker-plugin-lto whilst trying not to lose any innovations in the old benchmark.

Manishearth · 2024-12-26T18:02:06Z

Confirmed: only the two linker-plugin-lto targets are the ones which need linked LLVM versions. Proven by switching makefile to clang and LLD 15 and rebuilding the elf/o files.

Manishearth · 2024-12-26T18:03:35Z

Confirmed: two-step vs one-step is actually just "LLD vs no LLD"

Manishearth · 2024-12-26T18:08:25Z

-rwxr-x--- 1 manishearth primarygroup 1178184 Dec 26 10:02 panic-abort-clang.elf
-rwxr-x--- 1 manishearth primarygroup   47416 Dec 25 12:23 panic-abort-clang-gc-sections.elf
-rwxr-x--- 1 manishearth primarygroup   47416 Dec 26 10:02 panic-abort-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 26 10:06 panic-abort-linker-plugin-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 26 10:02 panic-abort-linker-plugin-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   33296 Dec 26 10:02 panic-abort-linker-plugin-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  133520 Dec 26 10:02 panic-abort-lto-clang.elf
-rwxr-x--- 1 manishearth primarygroup   34040 Dec 26 10:02 panic-abort-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  133520 Dec 26 10:02 panic-abort-lto-clang-twostep.elf
-rwxr-x--- 1 manishearth primarygroup   34040 Dec 26 10:02 panic-abort-lto-clang-twostep-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 5273536 Dec 26 10:02 release.elf
-rwxr-x--- 1 manishearth primarygroup  387664 Dec 26 10:02 release-gcc-stripped.elf

That's d21a723

I'm going to keep that commit around but rename the tests, now that we know that the phasing is irrelevant. I may still test "LLD vs not LLD", but really if we're testing LTO with clang we should always be using LLD so I don't think we need to.

Manishearth · 2024-12-26T20:21:47Z

Some results of the discussion with @sffc:

I still need to improve the benchmark: remove twophase/singlephase stuff, potentially benchmark "full" LTO
This benchmark should also be copied over to the segmenter folder. Datetime may be an interesting additional benchmark to add
c-tiny should get a README / tutorial file that explains to users how to build a size-optimized ICU4X, with a breakdown of the performance numbers and of the various options available, including their tradeoffs (needs nightly Rust, needs build-std, needs concordant LLVM versions, needs clang-not-GCC, kills debug symbols, etc)

Manishearth · 2024-12-27T22:13:48Z

Segmenter numbers, hot off the presses

-rwxr-x--- 1 manishearth primarygroup 5681200 Dec 27 14:03 panic-abort-clang.elf
-rwxr-x--- 1 manishearth primarygroup  383280 Dec 27 14:03 panic-abort-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup  372136 Dec 27 14:09 panic-abort-linker-plugin-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 4318128 Dec 27 14:04 panic-abort-lto-clang.elf
-rwxr-x--- 1 manishearth primarygroup  372744 Dec 27 14:04 panic-abort-lto-clang-stripped.elf
-rwxr-x--- 1 manishearth primarygroup 9601072 Dec 27 14:02 release.elf
-rwxr-x--- 1 manishearth primarygroup  719432 Dec 27 14:02 release-gcc-stripped.elf

linker-plugin-lto brings us from 372744 to 372136, an extremely small win. My expectation is that the wins will be diminishing as the programs get larger.

I couldn't get thin vs fat LTO in either clang or rustc to produce interesting differences. It's worth investigating more, but this is enough to convince me to ignore thin vs fat LTO for the benchmark for now, and maybe look into it later.

Manishearth · 2024-12-27T22:56:41Z

I added a basic README that explains what's going on with numbers

sffc · 2024-12-28T05:46:23Z

Would still like to see performance numbers due to cross-language inlining, which I expect to be at least a few percentage points if measuring tight loops in segmenter or normalizer, but I'm reasonably convinced that binary size impact is negligible enough that we don't need to hold up a 2.0 MSRV upgrade on this.

sffc · 2024-12-28T06:21:41Z

Suggestion: I think the ones I would include would be

In PR?	A/B	C settings	Rust settings	Comment
1	A	gcc	release
2	A	gcc, gc-sections, strip-all	release
	A	gcc, no special flags	panic-abort-lto
	B	gcc, gc-sections	panic-abort-lto
	B	gcc, strip-all	panic-abort-lto
	A	gcc, gc-sections, strip-all	panic-abort-lto	Best with GCC?
	B	clang	release
	B	clang, lto, strip-all	release
	A	clang	lto
	A	clang, lto, strip-all	lto	Best with stable Rust + decoupled LLVM?
	B	clang	panic-abort-lto
4	B	clang, lto	panic-abort-lto
	B	clang, strip-all	panic-abort-lto
	B	clang, gc-sections	panic-abort-lto
~6	A	clang, lto, strip-all	panic-abort-lto	gc-sections not needed with LTO?
5	B	clang, gc-sections, strip-all	panic-abort-lto
7	A	clang, lto, strip-all	panic-abort-lto-plugin	Best possible result?

There's a lot in that table, so I labeled the ones I feel are the "most important" with A and the rest with B.

Tests I dropped:

3 -- not much reason to test panic-abort without Rust lto? If you're going through the trouble of panic-abort, you should be using lto in Rust.

There are 4 Rust configurations required here, all building on the previous:

release is the standard release mode
lto is release mode but with LTO, -Os, stripping, codegen-units, and all other stable Rust binary size features. Maybe also include panic = "abort" (non-std version) in this?
panic-abort-lto is all of the above plus the Nightly Rust binary size features including rebuilding the Rust standard library with panic_immediate_abort.
panic-abort-lto-plugin is all of the above plus -Clinker-plugin-lto.

Thoughts?

tutorials/c-tiny/fixeddecimal/Makefile

Manishearth · 2024-12-29T23:51:24Z

3 -- not much reason to test panic-abort without Rust lto? If you're going through the trouble of panic-abort, you should be using lto in Rust.

I included it mostly as a "here's the general style of impact you can get with this change" test, but yeah perhaps not useful.

The A set definitely seems good. We should play around with gc-sections more at some point. B isn't a bad idea to include , too, especially if we can write the makefile more robustly using makefile functions and such. I don't plan to do that here, though: I suggest we land benchmarks 1-7 and keep the other issue open for getting benchmark set A/B in.

sffc

Doesn't have the targets recommended in my table but this is a really good step forward

Manishearth force-pushed the c-tiny-benchmark branch from f34fcbd to e98f0ce Compare December 25, 2024 20:26

Manishearth mentioned this pull request Dec 25, 2024

Decide upon and document policy for what LLVM versions are acceptable for CI LLVM tooling #5935

Open

Remove FORCE

990c948

Manishearth force-pushed the c-tiny-benchmark branch from e98f0ce to 990c948 Compare December 25, 2024 20:39

use lld everywhere, more twostep vs onestep

d21a723

Remove twostep, copy to segmenter

b3d68f8

Manishearth marked this pull request as ready for review December 27, 2024 22:13

Manishearth requested a review from a team as a code owner December 27, 2024 22:13

Manishearth added 2 commits December 27, 2024 14:27

ordering

e9391fe

readme

5449458

sffc reviewed Dec 28, 2024

View reviewed changes

tutorials/c-tiny/fixeddecimal/Makefile Outdated Show resolved Hide resolved

remove -g

b114401

Manishearth requested a review from sffc December 29, 2024 23:48

sffc approved these changes Dec 31, 2024

View reviewed changes

Manishearth merged commit 9486659 into unicode-org:main Dec 31, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved c-tiny benchmark #5948

Improved c-tiny benchmark #5948

Manishearth commented Dec 25, 2024 •

edited

Loading

Manishearth commented Dec 25, 2024 •

edited

Loading

sffc commented Dec 26, 2024

sffc commented Dec 26, 2024

sffc commented Dec 26, 2024

Manishearth commented Dec 26, 2024 •

edited

Loading

Manishearth commented Dec 26, 2024

Manishearth commented Dec 26, 2024

Manishearth commented Dec 26, 2024

Manishearth commented Dec 26, 2024

Manishearth commented Dec 27, 2024

Manishearth commented Dec 27, 2024

sffc commented Dec 28, 2024

sffc commented Dec 28, 2024

Manishearth commented Dec 29, 2024

sffc left a comment

Improved c-tiny benchmark #5948

Improved c-tiny benchmark #5948

Conversation

Manishearth commented Dec 25, 2024 • edited Loading

Manishearth commented Dec 25, 2024 • edited Loading

sffc commented Dec 26, 2024

sffc commented Dec 26, 2024

sffc commented Dec 26, 2024

Manishearth commented Dec 26, 2024 • edited Loading

Manishearth commented Dec 26, 2024

Manishearth commented Dec 26, 2024

Manishearth commented Dec 26, 2024

Manishearth commented Dec 26, 2024

Manishearth commented Dec 27, 2024

Manishearth commented Dec 27, 2024

sffc commented Dec 28, 2024

sffc commented Dec 28, 2024

Manishearth commented Dec 29, 2024

sffc left a comment

Choose a reason for hiding this comment

Manishearth commented Dec 25, 2024 •

edited

Loading

Manishearth commented Dec 25, 2024 •

edited

Loading

Manishearth commented Dec 26, 2024 •

edited

Loading