-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge basic blocks where possible when generating LLVM IR. #103138
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit a48b4cc08278a9a37892cc745a38b1cbfbf29340 with merge 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
Queued 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 with parent 1536ab1, future comparison URL. |
Finished benchmarking commit (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Footnotes |
The instruction count results aren't a win, but there are hints of goodness in the results for cycles, wall-time, max-rss, and especially binary size. The current version only merges the simplest cases, and there are quite a few more cases that can be handled, so I will continue working on them. |
a48b4cc
to
165b498
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 165b498be31961a522cedd64bb9bbe33c072d0f4 with merge 61e75799adaa22db3b3d115e5c1d921210da60ad... |
☀️ Try build successful - checks-actions |
Queued 61e75799adaa22db3b3d115e5c1d921210da60ad with parent 98a5ac2, future comparison URL. |
Finished benchmarking commit (61e75799adaa22db3b3d115e5c1d921210da60ad): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Footnotes |
Disappointing results here. The code is working as intended, and is merging lots of basic blocks. Here are some measurements for three metrics:
All measurements are for debug builds.
Plenty of shrinkage but the effect on compile times is negligible, or even a slight regression (for instruction counts) in some cases. The only good news is that the binary size of debug builds shrunk by a small amount in many cases, which makes sense, but it doesn't feel like enough of a benefit to continue pushing on this. |
To summarize:
|
165b498
to
9ca699a
Compare
1a945ae
to
22db2f6
Compare
@ehuss suggested the problem is the ordering of the debugger output. I have tweaked that and added a temporary commit to do more Windows testing on CI. |
22db2f6
to
f3f0c03
Compare
For the next commit, `FunctionCx::codegen_*_terminator` need to take a `&mut Bx` instead of consuming a `Bx`. This triggers a cascade of similar changes across multiple functions. The resulting code is more concise and replaces many `&mut bx` expressions with `bx`.
In `codegen_assert_terminator` we decide if a BB's successor is a candidate for merging, which requires that it be the only successor, and that it only have one predecessor. That result then gets passed down, and if it reaches `funclet_br` with the appropriate BB characteristics, then no `br` instruction is issued, a `MergingSucc::True` result is passed back, and the merging proceeds in `codegen_block`. The commit also adds `CachedLlbb`, a new type to help keep track of each BB that has been merged into its predecessor.
f3f0c03
to
54082dd
Compare
Slightly reordering the debuginfo output fixed the test failure. @bors r=bjorn3 |
@bors p=1 going to close the tree for non-nevers for a while so they can drain out |
☀️ Test successful - checks-actions |
Finished benchmarking commit (251831e): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Merge basic blocks where possible when generating LLVM IR. r? `@ghost`
Merge basic blocks where possible when generating LLVM IR. r? `@ghost`
r? @ghost