Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline overlap based CGU merging #113777

Merged
merged 4 commits into from
Jul 19, 2023

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Jul 17, 2023

Introduce a new CGU merging algorithm that aims to minimize the number of duplicated inlined items.

r? @wesleywiser

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 17, 2023
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 17, 2023
@bors
Copy link
Contributor

bors commented Jul 17, 2023

⌛ Trying commit 78be95ec10df27dbd635a3a5a420b29b8272b4b5 with merge 53bc7211881d8549388de313642b05ad9f9f4f56...

@bors
Copy link
Contributor

bors commented Jul 17, 2023

☀️ Try build successful - checks-actions
Build commit: 53bc7211881d8549388de313642b05ad9f9f4f56 (53bc7211881d8549388de313642b05ad9f9f4f56)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (53bc7211881d8549388de313642b05ad9f9f4f56): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [1.4%, 1.4%] 1
Regressions ❌
(secondary)
0.7% [0.7%, 0.7%] 4
Improvements ✅
(primary)
-2.2% [-4.4%, -0.6%] 8
Improvements ✅
(secondary)
-1.0% [-1.7%, -0.7%] 5
All ❌✅ (primary) -1.8% [-4.4%, 1.4%] 9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.7% [3.7%, 3.7%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.0% [-10.6%, -1.1%] 10
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.2% [-10.6%, 3.7%] 11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [1.3%, 3.0%] 5
Regressions ❌
(secondary)
3.6% [2.1%, 6.0%] 9
Improvements ✅
(primary)
-2.7% [-3.7%, -1.1%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.6% [-3.7%, 3.0%] 11

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.2%] 22
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-3.5%, -0.2%] 21
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) -0.4% [-3.5%, 1.2%] 43

Bootstrap: 657.644s -> 648.768s (-1.35%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 17, 2023
@nnethercote
Copy link
Contributor Author

Some good results:

  • Instructions counts are improved, though that doesn't really matter much for this case.
  • Cycle counts are slightly worse, but that also doesn't matter much.
  • Walltimes are roughly neutral, with wins and losses kinda balancing (esp. if you select "Show non-relevant results" to see all the small changes).
  • max-rss has a few big wins for large real-world crates, yay.
  • Binary size has some medium wins for large real-world crates, yay.
  • Bootstrap is 8.9s better, yay!

@nnethercote nnethercote force-pushed the overlap-based-cgu-merging branch from 78be95e to ea66bf9 Compare July 18, 2023 01:32
@nnethercote
Copy link
Contributor Author

@wesleywiser: this is ready for review!

Meanwhile, let's do another perf run, just to see if anything changes.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 18, 2023
@bors
Copy link
Contributor

bors commented Jul 18, 2023

⌛ Trying commit ea66bf9e6e3f29911168b53ff37e2a36c344ed06 with merge 8fe5a9539a402e69d0fed84df87f3645f271468a...

@bors
Copy link
Contributor

bors commented Jul 18, 2023

☀️ Try build successful - checks-actions
Build commit: 8fe5a9539a402e69d0fed84df87f3645f271468a (8fe5a9539a402e69d0fed84df87f3645f271468a)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8fe5a9539a402e69d0fed84df87f3645f271468a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.6% [1.6%, 1.6%] 1
Regressions ❌
(secondary)
0.7% [0.6%, 0.8%] 3
Improvements ✅
(primary)
-1.7% [-4.3%, -0.6%] 12
Improvements ✅
(secondary)
-1.1% [-1.1%, -1.1%] 1
All ❌✅ (primary) -1.4% [-4.3%, 1.6%] 13

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.4% [4.4%, 4.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.4% [-8.4%, -2.2%] 10
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.6% [-8.4%, 4.4%] 11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.1% [1.5%, 4.2%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-3.7%, -1.4%] 6
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) 0.5% [-3.7%, 4.2%] 13

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.2%] 19
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.9% [-4.2%, -0.2%] 24
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) -0.4% [-4.2%, 1.2%] 43

Bootstrap: 658.015s -> 646.273s (-1.78%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 18, 2023
@nnethercote
Copy link
Contributor Author

The second perf run gives results similar to the first.

@pnkfelix
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jul 18, 2023

📌 Commit ea66bf9e6e3f29911168b53ff37e2a36c344ed06 has been approved by pnkfelix

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 18, 2023
compiler/rustc_middle/src/mir/mono.rs Outdated Show resolved Hide resolved
Instead of repeatedly merging the two smallest CGUs, we now use a
merging algorithm that aims to minimize the duplication of inlined
functions.

`exa-0.10.1` was one benchmark that saw particularly good results. The
old CGU stats:
```
INTERNALIZE
- unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined)
- placed items: 3834 (1216 root + 2618 inlined), placed size: 154552 (77219 root + 77333 inlined)
- placed/unique items ratio: 1.38, placed/unique size ratio: 1.27
- CGUs: 16, mean size: 9659.5, sizes: [11791, 11634, 11173, 10987, 10939, 10507, 9992, 9813, 9593, 9580, 9030, 8447, 7975, 7961, 7876, 7254]
```
The new CGU stats:
```
INTERNALIZE
- unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined)
- placed items: 3626 (1216 root + 2410 inlined), placed size: 147201 (77219 root + 69982 inlined)
- placed/unique items ratio: 1.31, placed/unique size ratio: 1.21
- CGUs: 16, mean size: 9200.1, sizes: [11634, 10939, 10227, 9555, 9178, 9167, 8879, 8804, 8604, 8603 (x3), 8602 (x2), 8601, 8600]
```
The difference is in the number of inlined items. There are 1558 unique
inlined items. With the old algorithm these were placed 2618 times,
resulting in 1060 duplicates. With the new algorithm these were placed
2410 times, resulting in 852 duplicates. Also, the mean CGU size dropped
from 9659.5 to 9200.1, and the CGU size distribution tightened, with the
biggest one a little smaller and the smallest ones a little bigger.
@nnethercote nnethercote force-pushed the overlap-based-cgu-merging branch from ea66bf9 to 05de5d6 Compare July 18, 2023 21:23
@nnethercote
Copy link
Contributor Author

I fixed the typo.

@bors r=pnkfelix

@bors
Copy link
Contributor

bors commented Jul 18, 2023

📌 Commit 05de5d6 has been approved by pnkfelix

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 18, 2023
@bors
Copy link
Contributor

bors commented Jul 18, 2023

⌛ Testing commit 05de5d6 with merge 0d6a9b2...

@bors bors mentioned this pull request Jul 18, 2023
@bors
Copy link
Contributor

bors commented Jul 19, 2023

☀️ Test successful - checks-actions
Approved by: pnkfelix
Pushing 0d6a9b2 to master...

1 similar comment
@bors
Copy link
Contributor

bors commented Jul 19, 2023

☀️ Test successful - checks-actions
Approved by: pnkfelix
Pushing 0d6a9b2 to master...

@bors bors added merged-by-bors This PR was explicitly merged by bors. labels Jul 19, 2023
@bors bors merged commit 0d6a9b2 into rust-lang:master Jul 19, 2023
@rustbot rustbot added this to the 1.73.0 milestone Jul 19, 2023
@nnethercote nnethercote deleted the overlap-based-cgu-merging branch July 19, 2023 02:51
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0d6a9b2): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [1.2%, 1.2%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.8% [-4.5%, -0.3%] 11
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.6% [-4.5%, 1.2%] 12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.5% [2.9%, 4.0%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.5% [-8.2%, -1.9%] 11
Improvements ✅
(secondary)
-3.0% [-3.4%, -2.6%] 3
All ❌✅ (primary) -3.2% [-8.2%, 4.0%] 13

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [1.1%, 2.7%] 2
Regressions ❌
(secondary)
2.6% [2.2%, 3.1%] 3
Improvements ✅
(primary)
-2.1% [-3.8%, -1.1%] 9
Improvements ✅
(secondary)
-2.0% [-2.0%, -2.0%] 1
All ❌✅ (primary) -1.4% [-3.8%, 2.7%] 11

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.2%] 19
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-4.2%, -0.2%] 25
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) -0.5% [-4.2%, 1.2%] 44

Bootstrap: 656.931s -> 647.107s (-1.50%)

@rustbot rustbot removed the perf-regression Performance regression. label Jul 19, 2023
nnethercote added a commit to nnethercote/rust that referenced this pull request Jul 20, 2023
In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777,
it's much more common to have multiple CGUs with identical sizes. With
the existing code these same-sized items ended up in the
opposite-to-desired order due to the stable sorting. The code now starts
with a reverse sort (like is done in `partitioning.rs`) which gives the
behaviour we want. This doesn't matter much for perf, but makes profiles
in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of
`sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is
an identical CGU sort earlier in that function that already uses
`sort_by_key`.)
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jul 26, 2023
…nkfelix

Tweak CGU sorting in a couple of places.

In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)

r? `@pnkfelix`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 26, 2023
…nkfelix

Tweak CGU sorting in a couple of places.

In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)

r? ``@pnkfelix``
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jul 26, 2023
…nkfelix

Tweak CGU sorting in a couple of places.

In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)

r? ```@pnkfelix```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 27, 2023
…nkfelix

Tweak CGU sorting in a couple of places.

In `base.rs`, tweak how the CGU size interleaving works. Since rust-lang#113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)

r? `@pnkfelix`
@tmandry
Copy link
Member

tmandry commented Aug 1, 2023

I imagine that repeatedly merging cgus[N-1] with its closest neighbor in cgus[N:] could make the results very sensitive to the particular choice of N (codegen units). That would be very relevant for binary size, which is already subject to the many whims of the LLVM inliner. (I think most people who really care about binary size are setting N=1 today, often in development builds even, but this seems like a promising direction that could relax the need for that in the future.)

Out of curiosity @nnethercote, did you see anything while working on this that would suggest the results are, or are not, very stable with respect to small changes in the number of codegen units?

@nnethercote
Copy link
Contributor Author

I have no data one way or the other. I never tried changing the default N away from 16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants