Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge basic blocks where possible when generating LLVM IR. #103138

Merged
merged 2 commits into from
Nov 17, 2022

Conversation

nnethercote
Copy link
Contributor

r? @ghost

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 17, 2022
@nnethercote nnethercote marked this pull request as draft October 17, 2022 07:15
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 17, 2022
@bors
Copy link
Contributor

bors commented Oct 17, 2022

⌛ Trying commit a48b4cc08278a9a37892cc745a38b1cbfbf29340 with merge 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 17, 2022

☀️ Try build successful - checks-actions
Build commit: 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5)

@rust-timer
Copy link
Collaborator

Queued 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 with parent 1536ab1, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
1.0% [0.8%, 1.2%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.8% [-0.3%, 1.2%] 7

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
9.2% [9.2%, 9.2%] 1
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
-2.5% [-3.2%, -2.1%] 4
All ❌✅ (primary) -0.1% [-0.1%, -0.1%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

@rustbot rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 17, 2022
@nnethercote
Copy link
Contributor Author

The instruction count results aren't a win, but there are hints of goodness in the results for cycles, wall-time, max-rss, and especially binary size. The current version only merges the simplest cases, and there are quite a few more cases that can be handled, so I will continue working on them.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 18, 2022
@bors
Copy link
Contributor

bors commented Oct 18, 2022

⌛ Trying commit 165b498be31961a522cedd64bb9bbe33c072d0f4 with merge 61e75799adaa22db3b3d115e5c1d921210da60ad...

@bors
Copy link
Contributor

bors commented Oct 18, 2022

☀️ Try build successful - checks-actions
Build commit: 61e75799adaa22db3b3d115e5c1d921210da60ad (61e75799adaa22db3b3d115e5c1d921210da60ad)

@rust-timer
Copy link
Collaborator

Queued 61e75799adaa22db3b3d115e5c1d921210da60ad with parent 98a5ac2, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (61e75799adaa22db3b3d115e5c1d921210da60ad): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
0.9% [0.4%, 1.3%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 2
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 1
All ❌✅ (primary) 0.6% [-0.3%, 1.3%] 9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.0% [2.2%, 6.3%] 5
Improvements ✅
(primary)
-1.5% [-2.7%, -0.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-2.7%, -0.3%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 18, 2022
@nnethercote
Copy link
Contributor Author

Disappointing results here. The code is working as intended, and is merging lots of basic blocks. Here are some measurements for three metrics:

  • wc: size of LLVM IR as measured by running wc -l on the .ll output.
  • llvm-lines: size of LLVM IR as measured by cargo llvm-lines
  • br label: number of br label %bbN instructions in the LLVM IR.

All measurements are for debug builds.

-----------------------------------------------------------------------------
                wc                       llvm-lines               br label
-----------------------------------------------------------------------------
                before  after            before after             before after
-----------------------------------------------------------------------------
clap-3.1.6      657,418 629,719 (-4.3%)  296,511 287,343 (-3.1%)  22,001 12,848 (-42%)
regex-1.5.5     464,556 450,134 (-4.1%)  142,199 137,092 (-3.6%)  11,471  6,720 (-41%)
ripgrep-13.0.0  608,307 577,649 (-5.1%)  257,134 246,471 (-4.1%)  23,942 13,783 (-42%)
syn-1.0.89      410,964 393,340 (-4.3%)  171,194 165,376 (-3.4%)  13,361  7,598 (-43%)
-----------------------------------------------------------------------------

Plenty of shrinkage but the effect on compile times is negligible, or even a slight regression (for instruction counts) in some cases. The only good news is that the binary size of debug builds shrunk by a small amount in many cases, which makes sense, but it doesn't feel like enough of a benefit to continue pushing on this.

@nnethercote
Copy link
Contributor Author

nnethercote commented Oct 20, 2022

To summarize:

  • MIR uses one definition of BBs, and LLVM IR uses another. Most notably, function calls end a MIR BB but don't end an LLVM IR BB.
  • rustc generates reasonable MIR code.
  • rustc does a 1-to-1 translation of MIR BBs to LLVM IR BBs, which is reasonable.
  • The resulting LLVM IR looks a bit silly and quite sub-optimal, with many unconditional BB-to-BB jumps, because of the different BB definition.
  • The sub-optimality doesn't end up mattering much in terms of compiler perf.
  • The sub-optimality also doesn't matter for the output of opt builds, because LLVM can optimize away the extra jumps and the output ends up the same.
  • The sub-optimality matters slightly for the output of debug builds, because it causes binaries to be about 0.5% bigger. It may also make them slightly slower, though I haven't measured that and I suspect the effect would be very small, probably less than 0.5%.

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Nov 16, 2022
@nnethercote
Copy link
Contributor Author

@ehuss suggested the problem is the ordering of the debugger output. I have tweaked that and added a temporary commit to do more Windows testing on CI.

For the next commit, `FunctionCx::codegen_*_terminator` need to take a
`&mut Bx` instead of consuming a `Bx`. This triggers a cascade of
similar changes across multiple functions. The resulting code is more
concise and replaces many `&mut bx` expressions with `bx`.
In `codegen_assert_terminator` we decide if a BB's successor is a
candidate for merging, which requires that it be the only successor, and
that it only have one predecessor. That result then gets passed down,
and if it reaches `funclet_br` with the appropriate BB characteristics,
then no `br` instruction is issued, a `MergingSucc::True` result is
passed back, and the merging proceeds in `codegen_block`.

The commit also adds `CachedLlbb`, a new type to help keep track of
each BB that has been merged into its predecessor.
@nnethercote
Copy link
Contributor Author

Slightly reordering the debuginfo output fixed the test failure.

@bors r=bjorn3

@bors
Copy link
Contributor

bors commented Nov 16, 2022

📌 Commit 54082dd has been approved by bjorn3

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 16, 2022
@Manishearth
Copy link
Member

Manishearth commented Nov 16, 2022

@bors p=1

going to close the tree for non-nevers for a while so they can drain out

@bors
Copy link
Contributor

bors commented Nov 17, 2022

⌛ Testing commit 54082dd with merge 251831e...

@bors
Copy link
Contributor

bors commented Nov 17, 2022

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing 251831e to master...

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (251831e): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-1.4%, -0.4%] 6
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.2%] 2
All ❌✅ (primary) -0.8% [-1.4%, -0.4%] 6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.6% [3.5%, 5.5%] 3
Improvements ✅
(primary)
-1.0% [-1.9%, -0.1%] 2
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) -1.0% [-1.9%, -0.1%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

@nnethercote nnethercote deleted the merge-BBs branch November 17, 2022 06:38
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
Merge basic blocks where possible when generating LLVM IR.

r? `@ghost`
antoyo pushed a commit to antoyo/rust that referenced this pull request Jun 19, 2023
Merge basic blocks where possible when generating LLVM IR.

r? `@ghost`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants