Merge basic blocks where possible when generating LLVM IR. #103138

nnethercote · 2022-10-17T07:15:26Z

r? @ghost

nnethercote · 2022-10-17T07:15:45Z

@bors try @rust-timer queue

rust-timer · 2022-10-17T07:15:46Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-10-17T07:15:53Z

⌛ Trying commit a48b4cc08278a9a37892cc745a38b1cbfbf29340 with merge 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5...

bors · 2022-10-17T09:42:48Z

☀️ Try build successful - checks-actions
Build commit: 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5)

rust-timer · 2022-10-17T09:42:50Z

Queued 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 with parent 1536ab1, future comparison URL.

rust-timer · 2022-10-17T10:59:49Z

Finished benchmarking commit (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	1.0%	[0.8%, 1.2%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.8%	[-0.3%, 1.2%]	7

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	9.2%	[9.2%, 9.2%]	1
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-2.5%	[-3.2%, -2.1%]	4
All ❌✅ (primary)	-0.1%	[-0.1%, -0.1%]	1

Cycles

This benchmark run did not return any relevant results for this metric.

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

nnethercote · 2022-10-17T11:08:32Z

The instruction count results aren't a win, but there are hints of goodness in the results for cycles, wall-time, max-rss, and especially binary size. The current version only merges the simplest cases, and there are quite a few more cases that can be handled, so I will continue working on them.

nnethercote · 2022-10-18T06:34:57Z

@bors try @rust-timer queue

rust-timer · 2022-10-18T06:34:59Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-10-18T06:35:06Z

⌛ Trying commit 165b498be31961a522cedd64bb9bbe33c072d0f4 with merge 61e75799adaa22db3b3d115e5c1d921210da60ad...

bors · 2022-10-18T09:02:17Z

☀️ Try build successful - checks-actions
Build commit: 61e75799adaa22db3b3d115e5c1d921210da60ad (61e75799adaa22db3b3d115e5c1d921210da60ad)

rust-timer · 2022-10-18T09:02:19Z

Queued 61e75799adaa22db3b3d115e5c1d921210da60ad with parent 98a5ac2, future comparison URL.

rust-timer · 2022-10-18T10:23:30Z

Finished benchmarking commit (61e75799adaa22db3b3d115e5c1d921210da60ad): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	0.9%	[0.4%, 1.3%]	7
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	2
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	0.6%	[-0.3%, 1.3%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.0%	[2.2%, 6.3%]	5
Improvements ✅ (primary)	-1.5%	[-2.7%, -0.3%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-2.7%, -0.3%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

nnethercote · 2022-10-19T02:53:16Z

Disappointing results here. The code is working as intended, and is merging lots of basic blocks. Here are some measurements for three metrics:

wc: size of LLVM IR as measured by running wc -l on the .ll output.
llvm-lines: size of LLVM IR as measured by cargo llvm-lines
br label: number of br label %bbN instructions in the LLVM IR.

All measurements are for debug builds.

-----------------------------------------------------------------------------
                wc                       llvm-lines               br label
-----------------------------------------------------------------------------
                before  after            before after             before after
-----------------------------------------------------------------------------
clap-3.1.6      657,418 629,719 (-4.3%)  296,511 287,343 (-3.1%)  22,001 12,848 (-42%)
regex-1.5.5     464,556 450,134 (-4.1%)  142,199 137,092 (-3.6%)  11,471  6,720 (-41%)
ripgrep-13.0.0  608,307 577,649 (-5.1%)  257,134 246,471 (-4.1%)  23,942 13,783 (-42%)
syn-1.0.89      410,964 393,340 (-4.3%)  171,194 165,376 (-3.4%)  13,361  7,598 (-43%)
-----------------------------------------------------------------------------

Plenty of shrinkage but the effect on compile times is negligible, or even a slight regression (for instruction counts) in some cases. The only good news is that the binary size of debug builds shrunk by a small amount in many cases, which makes sense, but it doesn't feel like enough of a benefit to continue pushing on this.

nnethercote · 2022-10-20T03:45:49Z

To summarize:

MIR uses one definition of BBs, and LLVM IR uses another. Most notably, function calls end a MIR BB but don't end an LLVM IR BB.
rustc generates reasonable MIR code.
rustc does a 1-to-1 translation of MIR BBs to LLVM IR BBs, which is reasonable.
The resulting LLVM IR looks a bit silly and quite sub-optimal, with many unconditional BB-to-BB jumps, because of the different BB definition.
The sub-optimality doesn't end up mattering much in terms of compiler perf.
The sub-optimality also doesn't matter for the output of opt builds, because LLVM can optimize away the extra jumps and the output ends up the same.
The sub-optimality matters slightly for the output of debug builds, because it causes binaries to be about 0.5% bigger. It may also make them slightly slower, though I haven't measured that and I suspect the effect would be very small, probably less than 0.5%.

nnethercote · 2022-11-16T01:05:36Z

@ehuss suggested the problem is the ordering of the debugger output. I have tweaked that and added a temporary commit to do more Windows testing on CI.

For the next commit, `FunctionCx::codegen_*_terminator` need to take a `&mut Bx` instead of consuming a `Bx`. This triggers a cascade of similar changes across multiple functions. The resulting code is more concise and replaces many `&mut bx` expressions with `bx`.

In `codegen_assert_terminator` we decide if a BB's successor is a candidate for merging, which requires that it be the only successor, and that it only have one predecessor. That result then gets passed down, and if it reaches `funclet_br` with the appropriate BB characteristics, then no `br` instruction is issued, a `MergingSucc::True` result is passed back, and the merging proceeds in `codegen_block`. The commit also adds `CachedLlbb`, a new type to help keep track of each BB that has been merged into its predecessor.

nnethercote · 2022-11-16T04:47:20Z

Slightly reordering the debuginfo output fixed the test failure.

@bors r=bjorn3

bors · 2022-11-16T04:47:22Z

📌 Commit 54082dd has been approved by bjorn3

It is now in the queue for this repository.

Manishearth · 2022-11-16T17:45:05Z

@bors p=1

going to close the tree for non-nevers for a while so they can drain out

bors · 2022-11-17T01:56:28Z

⌛ Testing commit 54082dd with merge 251831e...

bors · 2022-11-17T04:47:05Z

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing 251831e to master...

rust-timer · 2022-11-17T06:04:58Z

Finished benchmarking commit (251831e): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-1.4%, -0.4%]	6
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.2%]	2
All ❌✅ (primary)	-0.8%	[-1.4%, -0.4%]	6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.6%	[3.5%, 5.5%]	3
Improvements ✅ (primary)	-1.0%	[-1.9%, -0.1%]	2
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	1
All ❌✅ (primary)	-1.0%	[-1.9%, -0.1%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

Merge basic blocks where possible when generating LLVM IR. r? `@ghost`

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 17, 2022

nnethercote marked this pull request as draft October 17, 2022 07:15

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 17, 2022

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 17, 2022

nnethercote force-pushed the merge-BBs branch from a48b4cc to 165b498 Compare October 18, 2022 06:34

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 18, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 18, 2022

nnethercote closed this Oct 21, 2022

nnethercote reopened this Nov 9, 2022

nnethercote force-pushed the merge-BBs branch from 165b498 to 9ca699a Compare November 9, 2022 06:38

nnethercote force-pushed the merge-BBs branch from 1a945ae to 22db2f6 Compare November 16, 2022 01:04

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Nov 16, 2022

nnethercote force-pushed the merge-BBs branch from 22db2f6 to f3f0c03 Compare November 16, 2022 02:17

nnethercote added 2 commits November 16, 2022 15:46

Use &mut Bx more.

68194aa

For the next commit, `FunctionCx::codegen_*_terminator` need to take a `&mut Bx` instead of consuming a `Bx`. This triggers a cascade of similar changes across multiple functions. The resulting code is more concise and replaces many `&mut bx` expressions with `bx`.

nnethercote force-pushed the merge-BBs branch from f3f0c03 to 54082dd Compare November 16, 2022 04:46

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 16, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 17, 2022

bors merged commit 251831e into rust-lang:master Nov 17, 2022

rustbot added this to the 1.67.0 milestone Nov 17, 2022

nnethercote deleted the merge-BBs branch November 17, 2022 06:38

Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023

Auto merge of rust-lang#103138 - nnethercote:merge-BBs, r=bjorn3

0d2bdef

Merge basic blocks where possible when generating LLVM IR. r? `@ghost`

antoyo pushed a commit to antoyo/rust that referenced this pull request Jun 19, 2023

Auto merge of rust-lang#103138 - nnethercote:merge-BBs, r=bjorn3

ee50714

Merge basic blocks where possible when generating LLVM IR. r? `@ghost`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge basic blocks where possible when generating LLVM IR. #103138

Merge basic blocks where possible when generating LLVM IR. #103138

nnethercote commented Oct 17, 2022

nnethercote commented Oct 17, 2022

rust-timer commented Oct 17, 2022

bors commented Oct 17, 2022

This comment has been minimized.

bors commented Oct 17, 2022

rust-timer commented Oct 17, 2022

rust-timer commented Oct 17, 2022

nnethercote commented Oct 17, 2022

nnethercote commented Oct 18, 2022

rust-timer commented Oct 18, 2022

bors commented Oct 18, 2022

bors commented Oct 18, 2022

rust-timer commented Oct 18, 2022

rust-timer commented Oct 18, 2022

nnethercote commented Oct 19, 2022

nnethercote commented Oct 20, 2022 •

edited

Loading

nnethercote commented Nov 16, 2022

nnethercote commented Nov 16, 2022

bors commented Nov 16, 2022

Manishearth commented Nov 16, 2022 •

edited

Loading

bors commented Nov 17, 2022

bors commented Nov 17, 2022

rust-timer commented Nov 17, 2022

Merge basic blocks where possible when generating LLVM IR. #103138

Merge basic blocks where possible when generating LLVM IR. #103138

Conversation

nnethercote commented Oct 17, 2022

nnethercote commented Oct 17, 2022

rust-timer commented Oct 17, 2022

bors commented Oct 17, 2022

This comment has been minimized.

bors commented Oct 17, 2022

rust-timer commented Oct 17, 2022

rust-timer commented Oct 17, 2022

Overall result: ❌ regressions - ACTION NEEDED

Footnotes

nnethercote commented Oct 17, 2022

nnethercote commented Oct 18, 2022

rust-timer commented Oct 18, 2022

bors commented Oct 18, 2022

bors commented Oct 18, 2022

rust-timer commented Oct 18, 2022

rust-timer commented Oct 18, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Footnotes

nnethercote commented Oct 19, 2022

nnethercote commented Oct 20, 2022 • edited Loading

nnethercote commented Nov 16, 2022

nnethercote commented Nov 16, 2022

bors commented Nov 16, 2022

Manishearth commented Nov 16, 2022 • edited Loading

bors commented Nov 17, 2022

bors commented Nov 17, 2022

rust-timer commented Nov 17, 2022

Overall result: ✅ improvements - no action needed

nnethercote commented Oct 20, 2022 •

edited

Loading

Manishearth commented Nov 16, 2022 •

edited

Loading