add a deep fast_reject routine #97345

lcnr · 2022-05-24T07:23:34Z

continues the work on #97136.

r? @nnethercote

Actually agree with you on the match structure 😆 let's see how that impacted perf 😅

lcnr · 2022-05-24T07:23:56Z

@bors try @rust-timer queue

rust-timer · 2022-05-24T07:23:57Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-05-24T07:24:03Z

⌛ Trying commit 9fe410feb297c11ac09a5185e1fc80e544b6cc97 with merge 5dd29bd39b3743aaf3c0c2669be7c188fe375735...

lcnr · 2022-05-24T07:28:24Z

@bors try @rust-timer queue

rust-timer · 2022-05-24T07:28:25Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-05-24T07:28:34Z

⌛ Trying commit c2a5d73b7644a936194dc63896be12ca64d1894c with merge b1f8b07304a489ac1719a510512c8f14ac90de0f...

lcnr · 2022-05-24T07:28:40Z

@bors abort

bors · 2022-05-24T09:00:55Z

☀️ Try build successful - checks-actions
Build commit: b1f8b07304a489ac1719a510512c8f14ac90de0f (b1f8b07304a489ac1719a510512c8f14ac90de0f)

rust-timer · 2022-05-24T09:00:57Z

Queued b1f8b07304a489ac1719a510512c8f14ac90de0f with parent acb5c16, future comparison URL.

rust-timer · 2022-05-24T10:18:28Z

Finished benchmarking commit (b1f8b07304a489ac1719a510512c8f14ac90de0f): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: mixed results

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	8	45	8	45
mean²	N/A	0.5%	-7.5%	-0.6%	-7.5%
max	N/A	0.7%	-42.4%	-0.8%	-42.4%

Max RSS (memory usage)

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	1	2	7	4	8
mean²	2.1%	1.2%	-2.1%	-1.9%	-1.6%
max	2.1%	1.3%	-2.9%	-4.8%	-2.9%

Cycles

Results

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 😿 relevant regressions found

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	4	14	0	14
mean²	N/A	3.5%	-17.4%	N/A	-17.4%
max	N/A	4.3%	-38.7%	N/A	-38.7%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

number of relevant changes ↩ ↩² ↩³
the arithmetic mean of the percent change ↩ ↩² ↩³

lcnr · 2022-05-24T14:42:34Z

ideally we extend that deep fast reject to also use it for coherence. That should probably give us some further speedups

rust/compiler/rustc_trait_selection/src/traits/coherence.rs

Lines 79 to 105 in b2eba05

    
           // Before doing expensive operations like entering an inference context, do 
        
           // a quick check via fast_reject to tell if the impl headers could possibly 
        
           // unify. 
        
           let impl1_ref = tcx.impl_trait_ref(impl1_def_id); 
        
           let impl2_ref = tcx.impl_trait_ref(impl2_def_id); 
        
           // Check if any of the input types definitely do not unify. 
        
           if iter::zip( 
        
               impl1_ref.iter().flat_map(|tref| tref.substs.types()), 
        
               impl2_ref.iter().flat_map(|tref| tref.substs.types()), 
        
           ) 
        
           .any(|(ty1, ty2)| { 
        
               let t1 = fast_reject::simplify_type(tcx, ty1, TreatParams::AsInfer); 
        
               let t2 = fast_reject::simplify_type(tcx, ty2, TreatParams::AsInfer); 
        
               if let (Some(t1), Some(t2)) = (t1, t2) { 
        
                   // Simplified successfully 
        
                   t1 != t2 
        
               } else { 
        
                   // Types might unify 
        
                   false 
        
               } 
        
           }) { 
        
               // Some types involved are definitely different, so the impls couldn't possibly overlap. 
        
               debug!("overlapping_impls: fast_reject early-exit"); 
        
               return no_overlap(); 
        
           }

For that we need a deep fast reject where both sides are inference vars, implementing that now

lcnr · 2022-05-24T15:06:12Z

@bors try @rust-timer queue

rust-timer · 2022-05-24T15:06:14Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-05-24T15:06:20Z

⌛ Trying commit df22932377f13307f936f9bc81ed70bef76bf5e5 with merge 3cd8dbec5da846625b1ccec2294b2656ad747c1a...

lcnr · 2022-05-24T17:06:59Z

@rust-timer queue df22932

rust-timer · 2022-05-24T17:07:01Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

lcnr · 2022-05-24T17:07:29Z

@rust-timer queue 3cd8dbec5da846625b1ccec2294b2656ad747c1a

nnethercote · 2022-05-24T22:40:05Z

So, why does bitmaps get crazy wins here where other benchmarks are just "modest"?

It has a type struct BitsImpl<const N: usize>, a trait Bits, and then (via macros) 1024(!) impls of Bits for BitsImpl<1> through BitsImpl<1024>. The compiler ends up comparing all those impls for overlap.

This Zulip thread has more details.