Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More aggressive inlining in the fast_reject code. #137760

Closed
wants to merge 1 commit into from

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Feb 27, 2025

This code is very hot in a couple of benchmarks.

r? @lcnr

This code is very hot in a couple of benchmarks.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 27, 2025
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 28, 2025
@bors
Copy link
Contributor

bors commented Feb 28, 2025

⌛ Trying commit 0be0df1 with merge ecb5e9e...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 28, 2025
More aggressive inlining in the `fast_reject` code.

This code is very hot in a couple of benchmarks.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Feb 28, 2025

☀️ Try build successful - checks-actions
Build commit: ecb5e9e (ecb5e9e7f16d3dc7266d638880dd6c4a51f2f4ca)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ecb5e9e): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.6% [-0.9%, -0.5%] 13
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.6% [-0.9%, -0.5%] 13

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary -2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-2.2%, -2.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.2% [-2.2%, -2.2%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 770.531s -> 769.678s (-0.11%)
Artifact size: 361.97 MiB -> 361.95 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 28, 2025
@nnethercote
Copy link
Contributor Author

Hmm, locally I got ~5% icount improvements on bitmaps and typenum, not the sub-1% improvements we see here. PGO again, I guess.

@nnethercote nnethercote marked this pull request as ready for review March 4, 2025 22:23
@nnethercote
Copy link
Contributor Author

@lcnr: not sure this is worth it. What do you think? It does recover the perf lost in #133566, but it's still less of an improvement than I hoped (and saw locally on non-PGO builds).

Comment on lines 358 to 363
ty::Adt(rhs_def, rhs_args) => {
lhs_def == rhs_def && self.args_may_unify_inner(lhs_args, rhs_args, depth)
// This call site can be hot.
lhs_def == rhs_def
&& self.inlined_args_may_unify_inner(lhs_args, rhs_args, depth)
}
_ => false,
Copy link
Contributor

@lcnr lcnr Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the perf impact of using

            ty::Adt(lhs_def, lhs_args) => match rhs.kind() {
                ty::Adt(rhs_def, rhs_args) if lhs_def == rhs_def => if lhs_args.len() == 1 {
                    // This code is very hot and a lot of generic ADTs have
                    // just a single generic argument.
                    self.inlined_arg_may_unify(lhs_args.as_slice()[0], rhs_args.as_slice()[0], depth)
                } else
                    self.args_may_unify_inner(lhs_args, rhs_args, depth)
                },
                _ => false,
            },

and not adding inlined_args_may_unify_inner?

I don't like call-sites using inlined_args_may_unify if we can avoid it/it doesn't have a meaningful perf impact to not do so 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is hot, but the two calls to inlined_args_may_unify are also hot. So only having the len==1 optimization at this call site would reduce its impact.

@nnethercote
Copy link
Contributor Author

I think this isn't worth persisting with. The improvements are smaller than I'd hoped, and they only affect a small fraction of real programs.

@nnethercote nnethercote closed this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants