Inline overlap based CGU merging #113777

nnethercote · 2023-07-17T05:44:13Z

Introduce a new CGU merging algorithm that aims to minimize the number of duplicated inlined items.

nnethercote · 2023-07-17T05:44:27Z

bors · 2023-07-17T05:44:35Z

⌛ Trying commit 78be95ec10df27dbd635a3a5a420b29b8272b4b5 with merge 53bc7211881d8549388de313642b05ad9f9f4f56...

bors · 2023-07-17T07:00:22Z

☀️ Try build successful - checks-actions
Build commit: 53bc7211881d8549388de313642b05ad9f9f4f56 (53bc7211881d8549388de313642b05ad9f9f4f56)

rust-timer · 2023-07-17T19:00:15Z

Finished benchmarking commit (53bc7211881d8549388de313642b05ad9f9f4f56): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.4%	[1.4%, 1.4%]	1
Regressions ❌ (secondary)	0.7%	[0.7%, 0.7%]	4
Improvements ✅ (primary)	-2.2%	[-4.4%, -0.6%]	8
Improvements ✅ (secondary)	-1.0%	[-1.7%, -0.7%]	5
All ❌✅ (primary)	-1.8%	[-4.4%, 1.4%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.7%	[3.7%, 3.7%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-5.0%	[-10.6%, -1.1%]	10
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-4.2%	[-10.6%, 3.7%]	11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.9%	[1.3%, 3.0%]	5
Regressions ❌ (secondary)	3.6%	[2.1%, 6.0%]	9
Improvements ✅ (primary)	-2.7%	[-3.7%, -1.1%]	6
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.6%	[-3.7%, 3.0%]	11

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 1.2%]	22
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-3.5%, -0.2%]	21
Improvements ✅ (secondary)	-3.6%	[-3.6%, -3.6%]	1
All ❌✅ (primary)	-0.4%	[-3.5%, 1.2%]	43

Bootstrap: 657.644s -> 648.768s (-1.35%)

nnethercote · 2023-07-17T22:25:37Z

Some good results:

Instructions counts are improved, though that doesn't really matter much for this case.
Cycle counts are slightly worse, but that also doesn't matter much.
Walltimes are roughly neutral, with wins and losses kinda balancing (esp. if you select "Show non-relevant results" to see all the small changes).
max-rss has a few big wins for large real-world crates, yay.
Binary size has some medium wins for large real-world crates, yay.
Bootstrap is 8.9s better, yay!

nnethercote · 2023-07-18T01:33:47Z

@wesleywiser: this is ready for review!

Meanwhile, let's do another perf run, just to see if anything changes.

@bors try @rust-timer queue

bors · 2023-07-18T01:33:55Z

⌛ Trying commit ea66bf9e6e3f29911168b53ff37e2a36c344ed06 with merge 8fe5a9539a402e69d0fed84df87f3645f271468a...

bors · 2023-07-18T02:51:17Z

☀️ Try build successful - checks-actions
Build commit: 8fe5a9539a402e69d0fed84df87f3645f271468a (8fe5a9539a402e69d0fed84df87f3645f271468a)

rust-timer · 2023-07-18T09:02:17Z

Finished benchmarking commit (8fe5a9539a402e69d0fed84df87f3645f271468a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.6%	[1.6%, 1.6%]	1
Regressions ❌ (secondary)	0.7%	[0.6%, 0.8%]	3
Improvements ✅ (primary)	-1.7%	[-4.3%, -0.6%]	12
Improvements ✅ (secondary)	-1.1%	[-1.1%, -1.1%]	1
All ❌✅ (primary)	-1.4%	[-4.3%, 1.6%]	13

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.4%	[4.4%, 4.4%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.4%	[-8.4%, -2.2%]	10
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.6%	[-8.4%, 4.4%]	11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.1%	[1.5%, 4.2%]	7
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.5%	[-3.7%, -1.4%]	6
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	0.5%	[-3.7%, 4.2%]	13

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 1.2%]	19
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.9%	[-4.2%, -0.2%]	24
Improvements ✅ (secondary)	-3.6%	[-3.6%, -3.6%]	1
All ❌✅ (primary)	-0.4%	[-4.2%, 1.2%]	43

Bootstrap: 658.015s -> 646.273s (-1.78%)

nnethercote · 2023-07-18T09:19:55Z

The second perf run gives results similar to the first.

pnkfelix · 2023-07-18T15:15:31Z

@bors r+

bors · 2023-07-18T15:15:33Z

📌 Commit ea66bf9e6e3f29911168b53ff37e2a36c344ed06 has been approved by pnkfelix

It is now in the queue for this repository.

compiler/rustc_middle/src/mir/mono.rs

Instead of repeatedly merging the two smallest CGUs, we now use a merging algorithm that aims to minimize the duplication of inlined functions. `exa-0.10.1` was one benchmark that saw particularly good results. The old CGU stats: ``` INTERNALIZE - unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined) - placed items: 3834 (1216 root + 2618 inlined), placed size: 154552 (77219 root + 77333 inlined) - placed/unique items ratio: 1.38, placed/unique size ratio: 1.27 - CGUs: 16, mean size: 9659.5, sizes: [11791, 11634, 11173, 10987, 10939, 10507, 9992, 9813, 9593, 9580, 9030, 8447, 7975, 7961, 7876, 7254] ``` The new CGU stats: ``` INTERNALIZE - unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined) - placed items: 3626 (1216 root + 2410 inlined), placed size: 147201 (77219 root + 69982 inlined) - placed/unique items ratio: 1.31, placed/unique size ratio: 1.21 - CGUs: 16, mean size: 9200.1, sizes: [11634, 10939, 10227, 9555, 9178, 9167, 8879, 8804, 8604, 8603 (x3), 8602 (x2), 8601, 8600] ``` The difference is in the number of inlined items. There are 1558 unique inlined items. With the old algorithm these were placed 2618 times, resulting in 1060 duplicates. With the new algorithm these were placed 2410 times, resulting in 852 duplicates. Also, the mean CGU size dropped from 9659.5 to 9200.1, and the CGU size distribution tightened, with the biggest one a little smaller and the smallest ones a little bigger.

nnethercote · 2023-07-18T21:24:07Z

I fixed the typo.

@bors r=pnkfelix

bors · 2023-07-18T21:24:09Z

📌 Commit 05de5d6 has been approved by pnkfelix

It is now in the queue for this repository.

bors · 2023-07-18T22:36:20Z

⌛ Testing commit 05de5d6 with merge 0d6a9b2...

bors · 2023-07-19T00:24:46Z

☀️ Test successful - checks-actions
Approved by: pnkfelix
Pushing 0d6a9b2 to master...

bors · 2023-07-19T00:24:46Z

☀️ Test successful - checks-actions
Approved by: pnkfelix
Pushing 0d6a9b2 to master...

rust-timer · 2023-07-19T02:56:44Z

Finished benchmarking commit (0d6a9b2): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[1.2%, 1.2%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.8%	[-4.5%, -0.3%]	11
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.6%	[-4.5%, 1.2%]	12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.5%	[2.9%, 4.0%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.5%	[-8.2%, -1.9%]	11
Improvements ✅ (secondary)	-3.0%	[-3.4%, -2.6%]	3
All ❌✅ (primary)	-3.2%	[-8.2%, 4.0%]	13

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.9%	[1.1%, 2.7%]	2
Regressions ❌ (secondary)	2.6%	[2.2%, 3.1%]	3
Improvements ✅ (primary)	-2.1%	[-3.8%, -1.1%]	9
Improvements ✅ (secondary)	-2.0%	[-2.0%, -2.0%]	1
All ❌✅ (primary)	-1.4%	[-3.8%, 2.7%]	11

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 1.2%]	19
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-4.2%, -0.2%]	25
Improvements ✅ (secondary)	-3.6%	[-3.6%, -3.6%]	1
All ❌✅ (primary)	-0.5%	[-4.2%, 1.2%]	44

Bootstrap: 656.931s -> 647.107s (-1.50%)

In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect. In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)

…nkfelix Tweak CGU sorting in a couple of places. In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect. In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.) r? `@pnkfelix`

…nkfelix Tweak CGU sorting in a couple of places. In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect. In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.) r? ``@pnkfelix``

…nkfelix Tweak CGU sorting in a couple of places. In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect. In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.) r? ```@pnkfelix```

…nkfelix Tweak CGU sorting in a couple of places. In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect. In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.) r? `@pnkfelix`

tmandry · 2023-08-01T23:08:29Z

I imagine that repeatedly merging cgus[N-1] with its closest neighbor in cgus[N:] could make the results very sensitive to the particular choice of N (codegen units). That would be very relevant for binary size, which is already subject to the many whims of the LLVM inliner. (I think most people who really care about binary size are setting N=1 today, often in development builds even, but this seems like a promising direction that could relax the need for that in the future.)

Out of curiosity @nnethercote, did you see anything while working on this that would suggest the results are, or are not, very stable with respect to small changes in the number of codegen units?

nnethercote · 2023-08-01T23:15:24Z

I have no data one way or the other. I never tried changing the default N away from 16.

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 17, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 17, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 17, 2023

nnethercote force-pushed the overlap-based-cgu-merging branch from 78be95e to ea66bf9 Compare July 18, 2023 01:32

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 18, 2023

rustbot assigned wesleywiser Jul 18, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 18, 2023

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 18, 2023

wesleywiser approved these changes Jul 18, 2023

View reviewed changes

compiler/rustc_middle/src/mir/mono.rs Outdated Show resolved Hide resolved

nnethercote added 2 commits July 19, 2023 07:23

Add a useful comment.

017c0b5

nnethercote force-pushed the overlap-based-cgu-merging branch from ea66bf9 to 05de5d6 Compare July 18, 2023 21:23

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 18, 2023

bors mentioned this pull request Jul 18, 2023

[beta-1.72] Update cargo #113836

Merged

bors added merged-by-bors This PR was explicitly merged by bors. labels Jul 19, 2023

bors merged commit 0d6a9b2 into rust-lang:master Jul 19, 2023

rustbot added this to the 1.73.0 milestone Jul 19, 2023

This was referenced Jul 19, 2023

Implement RFC 3323: restrictions #106074

Open

Recommend the home crate in env::home_dir()'s deprecation #110665

Closed

nnethercote deleted the overlap-based-cgu-merging branch July 19, 2023 02:51

rustbot removed the perf-regression Performance regression. label Jul 19, 2023

nnethercote mentioned this pull request Jul 20, 2023

Tweak CGU sorting in a couple of places. #113872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline overlap based CGU merging #113777

Inline overlap based CGU merging #113777

nnethercote commented Jul 17, 2023 •

edited

Loading

nnethercote commented Jul 17, 2023

This comment has been minimized.

bors commented Jul 17, 2023

bors commented Jul 17, 2023

This comment has been minimized.

rust-timer commented Jul 17, 2023

nnethercote commented Jul 17, 2023

nnethercote commented Jul 18, 2023

This comment has been minimized.

bors commented Jul 18, 2023

bors commented Jul 18, 2023

This comment has been minimized.

rust-timer commented Jul 18, 2023

nnethercote commented Jul 18, 2023

pnkfelix commented Jul 18, 2023

bors commented Jul 18, 2023

nnethercote commented Jul 18, 2023

bors commented Jul 18, 2023

bors commented Jul 18, 2023

bors commented Jul 19, 2023

bors commented Jul 19, 2023

rust-timer commented Jul 19, 2023

tmandry commented Aug 1, 2023

nnethercote commented Aug 1, 2023

Inline overlap based CGU merging #113777

Inline overlap based CGU merging #113777

Conversation

nnethercote commented Jul 17, 2023 • edited Loading

nnethercote commented Jul 17, 2023

This comment has been minimized.

bors commented Jul 17, 2023

bors commented Jul 17, 2023

This comment has been minimized.

rust-timer commented Jul 17, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

nnethercote commented Jul 17, 2023

nnethercote commented Jul 18, 2023

This comment has been minimized.

bors commented Jul 18, 2023

bors commented Jul 18, 2023

This comment has been minimized.

rust-timer commented Jul 18, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

nnethercote commented Jul 18, 2023

pnkfelix commented Jul 18, 2023

bors commented Jul 18, 2023

nnethercote commented Jul 18, 2023

bors commented Jul 18, 2023

bors commented Jul 18, 2023

bors commented Jul 19, 2023

bors commented Jul 19, 2023

rust-timer commented Jul 19, 2023

Overall result: ✅ improvements - no action needed

tmandry commented Aug 1, 2023

nnethercote commented Aug 1, 2023

nnethercote commented Jul 17, 2023 •

edited

Loading