More aggressive inlining in the `fast_reject` code. #137760

nnethercote · 2025-02-27T23:26:18Z

This code is very hot in a couple of benchmarks.

r? @lcnr

This code is very hot in a couple of benchmarks.

nnethercote · 2025-02-28T00:04:24Z

@bors try @rust-timer queue

bors · 2025-02-28T00:05:35Z

⌛ Trying commit 0be0df1 with merge ecb5e9e...

More aggressive inlining in the `fast_reject` code. This code is very hot in a couple of benchmarks. r? `@ghost`

bors · 2025-02-28T02:05:43Z

☀️ Try build successful - checks-actions
Build commit: ecb5e9e (ecb5e9e7f16d3dc7266d638880dd6c4a51f2f4ca)

rust-timer · 2025-02-28T04:13:08Z

Finished benchmarking commit (ecb5e9e): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-0.9%, -0.5%]	13
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.6%	[-0.9%, -0.5%]	13

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary -2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-2.2%, -2.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.2%	[-2.2%, -2.2%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 770.531s -> 769.678s (-0.11%)
Artifact size: 361.97 MiB -> 361.95 MiB (-0.01%)

nnethercote · 2025-03-04T22:23:01Z

Hmm, locally I got ~5% icount improvements on bitmaps and typenum, not the sub-1% improvements we see here. PGO again, I guess.

nnethercote · 2025-03-04T22:26:06Z

@lcnr: not sure this is worth it. What do you think? It does recover the perf lost in #133566, but it's still less of an improvement than I hoped (and saw locally on non-PGO builds).

lcnr · 2025-03-05T08:11:03Z

compiler/rustc_type_ir/src/fast_reject.rs

                ty::Adt(rhs_def, rhs_args) => {
-                    lhs_def == rhs_def && self.args_may_unify_inner(lhs_args, rhs_args, depth)
+                    // This call site can be hot.
+                    lhs_def == rhs_def
+                        && self.inlined_args_may_unify_inner(lhs_args, rhs_args, depth)
                }
                _ => false,


what's the perf impact of using

ty::Adt(lhs_def, lhs_args) => match rhs.kind() { ty::Adt(rhs_def, rhs_args) if lhs_def == rhs_def => if lhs_args.len() == 1 { // This code is very hot and a lot of generic ADTs have // just a single generic argument. self.inlined_arg_may_unify(lhs_args.as_slice()[0], rhs_args.as_slice()[0], depth) } else self.args_may_unify_inner(lhs_args, rhs_args, depth) }, _ => false, },

and not adding inlined_args_may_unify_inner?

I don't like call-sites using inlined_args_may_unify if we can avoid it/it doesn't have a meaningful perf impact to not do so 🤔

This call is hot, but the two calls to inlined_args_may_unify are also hot. So only having the len==1 optimization at this call site would reduce its impact.

nnethercote · 2025-03-06T04:37:28Z

I think this isn't worth persisting with. The improvements are smaller than I'd hoped, and they only affect a small fraction of real programs.

More aggressive inlining in the fast_reject code.

0be0df1

This code is very hot in a couple of benchmarks.

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 27, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 28, 2025

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 28, 2025

Auto merge of rust-lang#137760 - nnethercote:inline-fast_reject, r=<try>

ecb5e9e

More aggressive inlining in the `fast_reject` code. This code is very hot in a couple of benchmarks. r? `@ghost`

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 28, 2025

nnethercote marked this pull request as ready for review March 4, 2025 22:23

rustbot assigned lcnr Mar 4, 2025

lcnr reviewed Mar 5, 2025

View reviewed changes

nnethercote closed this Mar 6, 2025

nnethercote deleted the inline-fast_reject branch May 22, 2025 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More aggressive inlining in the `fast_reject` code. #137760

More aggressive inlining in the `fast_reject` code. #137760

Uh oh!

nnethercote commented Feb 27, 2025 •

edited

Loading

Uh oh!

nnethercote commented Feb 28, 2025

Uh oh!

This comment has been minimized.

bors commented Feb 28, 2025

Uh oh!

bors commented Feb 28, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Feb 28, 2025

Uh oh!

nnethercote commented Mar 4, 2025

Uh oh!

nnethercote commented Mar 4, 2025

Uh oh!

lcnr Mar 5, 2025 •

edited

Loading

Uh oh!

nnethercote Mar 6, 2025

Uh oh!

nnethercote commented Mar 6, 2025

Uh oh!

Uh oh!

More aggressive inlining in the fast_reject code. #137760

More aggressive inlining in the fast_reject code. #137760

Uh oh!

Conversation

nnethercote commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented Feb 28, 2025

Uh oh!

This comment has been minimized.

bors commented Feb 28, 2025

Uh oh!

bors commented Feb 28, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Feb 28, 2025

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

nnethercote commented Mar 4, 2025

Uh oh!

nnethercote commented Mar 4, 2025

Uh oh!

lcnr Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnethercote Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Mar 6, 2025

Uh oh!

Uh oh!

More aggressive inlining in the `fast_reject` code. #137760

More aggressive inlining in the `fast_reject` code. #137760

nnethercote commented Feb 27, 2025 •

edited

Loading

lcnr Mar 5, 2025 •

edited

Loading