Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a deep fast_reject routine #97345

Merged
merged 2 commits into from
May 25, 2022
Merged

add a deep fast_reject routine #97345

merged 2 commits into from
May 25, 2022

Conversation

lcnr
Copy link
Contributor

@lcnr lcnr commented May 24, 2022

continues the work on #97136.

r? @nnethercote

Actually agree with you on the match structure 😆 let's see how that impacted perf 😅

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 24, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 24, 2022
@lcnr lcnr marked this pull request as ready for review May 24, 2022 07:23
@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 24, 2022
@bors
Copy link
Contributor

bors commented May 24, 2022

⌛ Trying commit 9fe410feb297c11ac09a5185e1fc80e544b6cc97 with merge 5dd29bd39b3743aaf3c0c2669be7c188fe375735...

@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented May 24, 2022

⌛ Trying commit c2a5d73b7644a936194dc63896be12ca64d1894c with merge b1f8b07304a489ac1719a510512c8f14ac90de0f...

@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

@bors abort

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented May 24, 2022

☀️ Try build successful - checks-actions
Build commit: b1f8b07304a489ac1719a510512c8f14ac90de0f (b1f8b07304a489ac1719a510512c8f14ac90de0f)

@rust-timer
Copy link
Collaborator

Queued b1f8b07304a489ac1719a510512c8f14ac90de0f with parent acb5c16, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b1f8b07304a489ac1719a510512c8f14ac90de0f): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: mixed results
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 8 45 8 45
mean2 N/A 0.5% -7.5% -0.6% -7.5%
max N/A 0.7% -42.4% -0.8% -42.4%

Max RSS (memory usage)

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 1 2 7 4 8
mean2 2.1% 1.2% -2.1% -1.9% -1.6%
max 2.1% 1.3% -2.9% -4.8% -2.9%

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 😿 relevant regressions found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 4 14 0 14
mean2 N/A 3.5% -17.4% N/A -17.4%
max N/A 4.3% -38.7% N/A -38.7%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. number of relevant changes 2 3

  2. the arithmetic mean of the percent change 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 24, 2022
@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

ideally we extend that deep fast reject to also use it for coherence. That should probably give us some further speedups

// Before doing expensive operations like entering an inference context, do
// a quick check via fast_reject to tell if the impl headers could possibly
// unify.
let impl1_ref = tcx.impl_trait_ref(impl1_def_id);
let impl2_ref = tcx.impl_trait_ref(impl2_def_id);
// Check if any of the input types definitely do not unify.
if iter::zip(
impl1_ref.iter().flat_map(|tref| tref.substs.types()),
impl2_ref.iter().flat_map(|tref| tref.substs.types()),
)
.any(|(ty1, ty2)| {
let t1 = fast_reject::simplify_type(tcx, ty1, TreatParams::AsInfer);
let t2 = fast_reject::simplify_type(tcx, ty2, TreatParams::AsInfer);
if let (Some(t1), Some(t2)) = (t1, t2) {
// Simplified successfully
t1 != t2
} else {
// Types might unify
false
}
}) {
// Some types involved are definitely different, so the impls couldn't possibly overlap.
debug!("overlapping_impls: fast_reject early-exit");
return no_overlap();
}

For that we need a deep fast reject where both sides are inference vars, implementing that now

@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 24, 2022
@bors
Copy link
Contributor

bors commented May 24, 2022

⌛ Trying commit df22932377f13307f936f9bc81ed70bef76bf5e5 with merge 3cd8dbec5da846625b1ccec2294b2656ad747c1a...

@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

@rust-timer queue df22932

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@lcnr
Copy link
Contributor Author

lcnr commented May 24, 2022

@rust-timer queue 3cd8dbec5da846625b1ccec2294b2656ad747c1a

@nnethercote
Copy link
Contributor

So, why does bitmaps get crazy wins here where other benchmarks are just "modest"?

It has a type struct BitsImpl<const N: usize>, a trait Bits, and then (via macros) 1024(!) impls of Bits for BitsImpl<1> through BitsImpl<1024>. The compiler ends up comparing all those impls for overlap.

This Zulip thread has more details.

ty::Adt(obl_def, obl_substs) => match k {
&ty::Adt(impl_def, impl_substs) => {
obl_def == impl_def
&& iter::zip(obl_substs, impl_substs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if obl_def == impl_def then the substs are guaranteed to have the same length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

}
_ => false,
},
ty::Slice(obl_ty) => matches!(k, &ty::Slice(impl_ty) if types_may_unify(obl_ty, impl_ty)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use matches!-with-an-if in more of these cases... probably good to have consistency one way or the other. Since this is the only one at the moment, maybe a match would be better here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my issue with matches is that it somewhat breaks formatting which is annoying.

Making this consistent is a bit annoying 😆 going to keep it this way for now '^^

// Ideally we would walk the existential predicates here or at least
// compare their length. But considering that the relevant `Relate` impl
// actually sorts and deduplicates these, that doesn't work.
matches!(k, ty::Dynamic(impl_preds, ..) if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this one is a matches! too.

@nnethercote
Copy link
Contributor

Great additional improvements on bitmaps.

Even better, I measured with a trunk version of nalgebra. My changes in #97136 got compile time reductions of up to 10%, but with your additional coherence changes it's now more like 50%! (cc @lqd)

The changes mostly look good. I have a few nits above, I'll let you decide how to deal with them. But please squash together commits 1, 3, and 4 before merging.

@nnethercote
Copy link
Contributor

@bors delegate=lcnr

@bors
Copy link
Contributor

bors commented May 24, 2022

✌️ @lcnr can now approve this pull request

@nnethercote
Copy link
Contributor

I measured some benchmarks that we know from this analysis have some similarities to bitmaps:

  • aes-gcm-0.9.4
  • bigdecimal-0.3.0
  • bitmaps-3.1.0
  • bytemuck-1.7.3
  • bytestring-1.0.0
  • hex-0.4.3
  • lzw-0.10.0
  • nalgebra-0.30.1
  • num-complex-0.4.0
  • ordered-float-2.10.0
  • pbkdf2-0.10.0
  • quickcheck-1.0.3
  • rustc-rayon-0.3.2
  • scroll-0.11.0
  • secrecy-0.8.0
  • sha3-0.10.0
  • strsim-0.10.0

Instruction counts measured locally for check builds:

Benchmark Profile Scenario % Change Significance Factor?
bitmaps-3.1.0 check full -68.61% 343.05x
bitmaps-3.1.0 check incr-full -62.70% 313.49x
nalgebra-0.30.1 check incr-full -59.27% 296.36x
nalgebra-0.30.1 check full -57.36% 286.78x
secrecy-0.8.0 check full -38.76% 193.81x
secrecy-0.8.0 check incr-full -31.52% 157.60x
hex-0.4.3 check full -22.94% 114.71x
hex-0.4.3 check incr-full -20.01% 100.05x
bytemuck-1.7.3 check full -15.27% 76.34x
ordered-float-2.10.0 check full -13.14% 65.68x
bytemuck-1.7.3 check incr-full -12.62% 63.12x
num-complex-0.4.0 check full -12.47% 62.35x
ordered-float-2.10.0 check incr-full -10.83% 54.13x
num-complex-0.4.0 check incr-full -10.59% 52.97x
bitmaps-3.1.0 check incr-patched: println -10.05% 50.25x
bitmaps-3.1.0 check incr-unchanged -9.75% 48.75x
bytestring-1.0.0 check full -8.42% 42.11x
rustc-rayon-0.3.2 check full -8.02% 40.12x
rustc-rayon-0.3.2 check incr-full -6.86% 34.31x
bytestring-1.0.0 check incr-full -6.54% 32.69x
pbkdf2-0.10.0 check full -5.76% 28.81x
bigdecimal-0.3.0 check full -5.63% 28.17x
bigdecimal-0.3.0 check incr-full -4.92% 24.59x
strsim-0.10.0 check full -4.92% 24.59x
pbkdf2-0.10.0 check incr-full -4.56% 22.80x
strsim-0.10.0 check incr-full -3.80% 19.02x
rustc-rayon-0.3.2 check incr-unchanged -1.83% 9.17x
aes-gcm-0.9.4 check full -1.24% 6.18x
bigdecimal-0.3.0 check incr-unchanged -1.22% 6.10x
aes-gcm-0.9.4 check incr-full -0.95% 4.74x
hex-0.4.3 check incr-unchanged -0.94% 4.68x
ordered-float-2.10.0 check incr-unchanged -0.86% 4.30x
num-complex-0.4.0 check incr-unchanged -0.82% 4.09x
secrecy-0.8.0 check incr-unchanged -0.77% 3.87x
sha3-0.10.0 check full -0.77% 3.86x
scroll-0.11.0 check full -0.70% 3.50x
sha3-0.10.0 check incr-full -0.58% 2.90x
lzw-0.10.0 check full -0.52% 2.59x
scroll-0.11.0 check incr-full -0.51% 2.54x
nalgebra-0.30.1 check incr-unchanged -0.44% 2.20x
lzw-0.10.0 check incr-full -0.31% 1.54x

Woo!

@rust-log-analyzer

This comment has been minimized.

lcnr and others added 2 commits May 25, 2022 07:40
`match_impl` has two call sites. For one of them (within `rematch_impl`)
the fast reject test isn't necessary, because any rejection would
represent a compiler bug.

This commit moves the fast reject test to the other `match_impl` call
site, in `assemble_candidates_from_impls`. This lets us move the fast
reject test outside the `probe` call in that function. This avoids the
taking of useless snapshots when the fast reject test succeeds, which
gives a performance win when compiling the `bitmaps` and `nalgebra`
crates.

Co-authored-by: name <n.nethercote@gmail.com>
@lcnr
Copy link
Contributor Author

lcnr commented May 25, 2022

@bors r=nnethercote

@bors
Copy link
Contributor

bors commented May 25, 2022

📌 Commit bff7b51 has been approved by nnethercote

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 25, 2022
@bors
Copy link
Contributor

bors commented May 25, 2022

⌛ Testing commit bff7b51 with merge 4a99c5f...

@bors
Copy link
Contributor

bors commented May 25, 2022

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing 4a99c5f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 25, 2022
@bors bors merged commit 4a99c5f into rust-lang:master May 25, 2022
@rustbot rustbot added this to the 1.63.0 milestone May 25, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4a99c5f): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
0.9% 1.6% 10
Improvements 🎉
(primary)
-11.0% -65.8% 45
Improvements 🎉
(secondary)
-0.5% -0.9% 12
All 😿🎉 (primary) -11.0% -65.8% 45

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.7% 3.2% 2
Improvements 🎉
(primary)
-1.9% -2.7% 8
Improvements 🎉
(secondary)
-1.7% -2.4% 2
All 😿🎉 (primary) -1.9% -2.7% 8

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-22.6% -60.8% 17
Improvements 🎉
(secondary)
-2.6% -2.6% 1
All 😿🎉 (primary) -22.6% -60.8% 17

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants