Regression of compilation times on the latest nightly (2018-03-20) #49237

newpavlov · 2018-03-21T14:29:05Z

For keccak crate I use simple loop unrolling macros:

macro_rules! unroll5 {
    ($var:ident, $body:block) => {
        { const $var: usize = 0; $body; }
        { const $var: usize = 1; $body; }
        { const $var: usize = 2; $body; }
        { const $var: usize = 3; $body; }
        { const $var: usize = 4; $body; }
    };
}

Which combined generate a lot of unrolled code as a final result. On the latest Nightly it takes a lot more time to compile the crate than previously:

$ cargo clean; rustup run stable cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 3.9 secs
cargo clean; rustup run stable cargo check
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 2.30 secs
$ cargo clean; rustup run nightly cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 18.65 secs
$ cargo clean; rustup run nightly cargo check
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 2.32 secs

Here I've used only 10 iterations instead of 24 in the unroll24 macro, full version takes more than several minutes to compile. Judging by cargo check warnings expansion takes approximately the same time and the drastic difference comes from later stages.

EDIT: RUSTFLAGS="-Z time-passes" rustup run nightly cargo build produces the following result:

Click to expand

  time: 0.001; rss: 48MB	parsing
  time: 0.000; rss: 50MB	garbage collect incremental cache directory
  time: 0.000; rss: 50MB	recursion limit
  time: 0.000; rss: 50MB	crate injection
  time: 0.000; rss: 50MB	plugin loading
  time: 0.000; rss: 50MB	plugin registration
  time: 0.000; rss: 50MB	background load prev dep-graph
  time: 0.059; rss: 71MB	expansion
  time: 0.000; rss: 71MB	maybe building test harness
  time: 0.001; rss: 71MB	maybe creating a macro crate
  time: 0.003; rss: 71MB	creating allocators
  time: 0.002; rss: 71MB	AST validation
  time: 0.013; rss: 74MB	name resolution
  time: 0.002; rss: 74MB	complete gated feature checking
  time: 0.000; rss: 74MB	blocked while dep-graph loading finishes
  time: 0.016; rss: 80MB	lowering ast -> hir
  time: 0.008; rss: 80MB	early lint checks
  time: 0.021; rss: 83MB	indexing hir
  time: 0.000; rss: 79MB	load query result cache
  time: 0.000; rss: 79MB	looking for entry point
  time: 0.000; rss: 79MB	looking for plugin registrar
  time: 0.001; rss: 79MB	loop checking
  time: 0.000; rss: 81MB	attribute checking
  time: 0.006; rss: 84MB	stability checking
  time: 0.012; rss: 88MB	type collecting
  time: 0.000; rss: 88MB	outlives testing
  time: 0.000; rss: 88MB	impl wf inference
  time: 0.000; rss: 88MB	coherence checking
  time: 0.000; rss: 88MB	variance testing
  time: 0.046; rss: 106MB	wf checking
  time: 0.023; rss: 109MB	item-types checking
  time: 1.518; rss: 121MB	item-bodies checking
  time: 0.031; rss: 122MB	rvalue promotion
  time: 0.014; rss: 122MB	privacy checking
  time: 0.002; rss: 122MB	intrinsic checking
  time: 0.006; rss: 122MB	match checking
  time: 0.084; rss: 118MB	liveness checking
  time: 0.324; rss: 134MB	borrow checking
  time: 0.002; rss: 135MB	MIR borrow checking
  time: 0.000; rss: 135MB	MIR effect checking
  time: 0.003; rss: 135MB	death checking
  time: 0.000; rss: 135MB	unused lib feature checking
  time: 0.039; rss: 138MB	lint checking
  time: 0.000; rss: 138MB	dumping chalk-like clauses
  time: 0.000; rss: 138MB	resolving dependency formats
    time: 0.037; rss: 140MB	write metadata
    time: 14.945; rss: 143MB	translation item collection
    time: 0.000; rss: 143MB	codegen unit partitioning
    time: 0.142; rss: 151MB	translate to LLVM IR
    time: 0.000; rss: 151MB	assert dep graph
    time: 0.000; rss: 152MB	llvm function passes [2lyh15q6cjwzy18c]
    time: 0.000; rss: 152MB	llvm module passes [2lyh15q6cjwzy18c]
    time: 0.002; rss: 159MB	codegen passes [2lyh15q6cjwzy18c]
      time: 0.024; rss: 160MB	persist query result cache
    time: 0.030; rss: 162MB	llvm function passes [30rksvufw6ddw8se]
    time: 0.004; rss: 163MB	llvm module passes [30rksvufw6ddw8se]
      time: 0.011; rss: 161MB	persist dep-graph
    time: 0.034; rss: 161MB	serialize dep graph
  time: 15.160; rss: 161MB	translation
    time: 0.526; rss: 117MB	codegen passes [30rksvufw6ddw8se]
  time: 0.645; rss: 108MB	LLVM passes
  time: 0.000; rss: 108MB	serialize work products
  time: 0.001; rss: 108MB	linking

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2018-03-21T20:41:26Z

cc @rust-lang/compiler -- can we assign someone to investigate?

Zoxc · 2018-03-21T20:52:51Z

@oli-obk Could this be Miri?

oli-obk · 2018-03-22T06:36:38Z

Yes this is very likely to be miri related.

It has a lot of constants if I understand it correctly. It also looks like the collector and trans take up most of the time. This makes sense as we now collect all constants, which we didn't before. These private constants shouldn't be collected though.

michaelwoerister · 2018-03-22T09:48:26Z

cc @eddyb

nikomatsakis · 2018-03-22T14:18:16Z

triage: P-high

nikomatsakis · 2018-03-22T14:18:26Z

Looks like the fearless @oli-obk is on it

oli-obk · 2018-03-22T14:18:32Z

The keccak crate is a great benchmark for ridiculous numbers of locals and basic blocks. We should add it to the list of perf tests.

Anyways, the fix works locally

newpavlov · 2018-03-22T22:50:55Z

On nightly replacing { const $var: usize = N; $body; } with $var = N; $body; reduces compilation time a bit, but not that much, but on the stable it's a vice versa.

@michaelwoerister

…er,eddyb Remove slow HashSet during miri stack frame creation fixes rust-lang#49237 probably has a major impact on rust-lang#48846 r? @michaelwoerister cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.

@michaelwoerister

…er,eddyb Remove slow HashSet during miri stack frame creation fixes rust-lang#49237 probably has a major impact on rust-lang#48846 r? @michaelwoerister cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.

@michaelwoerister

…er,eddyb Remove slow HashSet during miri stack frame creation fixes rust-lang#49237 probably has a major impact on rust-lang#48846 r? @michaelwoerister cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.

newpavlov · 2018-03-27T11:10:27Z

On 2018-03-26 nightly compilation times become better, but still ~4 times slower compared to 1.24 stable:

$ cargo clean ; rustup run stable cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 12.34 secs
$ cargo clean ; rustup run nightly cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 46.81 secs

oli-obk · 2018-03-28T08:31:38Z

I'll investigate. I removed P-high, because while a 4x regression is bad, manually unrolling loops is a corner case (2k unrolled loop bodies in a single function is a lot of copies of the same code!)

newpavlov · 2018-03-28T08:36:01Z

Note that I've changed the code, now it does not unroll the main loop (thus it generates 24 times less code), to test the old version see rev a417e97.

pnkfelix · 2018-04-12T15:00:05Z

triage: P-medium

downgrading to P-medium since oli-obk is investigating and we do not feel need to revisit at the weekly compiler team meeting.

oli-obk · 2018-04-24T15:21:02Z

So... the regressions visible in -Ztime-passes are:

item-bodies checking (7 -> 24)
borrow-checking (1 -> 4)
write metadata (0.1 -> 40)
translation item collection (2 -> 40)
translate to llvm (0.5 -> 2.3)
translation (2.5 -> 44)
llvm passes (1 -> 2)

Some info:

per unrolled loop we get ~350 const_eval calls
the function (when unrolling 4x) has 12k MIR locals
const propagation takes "only" 3 seconds, so it can't be the reason for the long compile times
the logs show a few gaps at 4x unroll:

TRACE 2018-04-24T15:06:24Z: rustc_mir::interpret::memory: mark_static_initialized AllocId(8), mutability: Immutable
DEBUG 2018-04-24T15:06:41Z: rustc_mir::transform::check_unsafety: unsafety_violations(DefId(0/0:5 ~ keccak[1670]::RHO[0]))

DEBUG 2018-04-24T15:06:42Z: rustc_mir::transform::simplify: simplifying bb3754
DEBUG 2018-04-24T15:06:50Z: rustc_mir::transform::check_unsafety: check_unsafety(DefId(0/0:4 ~ keccak[1670]::PLEN[0]))

DEBUG 2018-04-24T15:06:50Z: rustc_mir::transform::elaborate_drops: elaborate_drops(MirSource { def_id: DefId(0/0:10 ~ keccak[1670]::f1600[0]), promoted: None } @ src/lib.rs:128:1: 182:2)
DEBUG 2018-04-24T15:06:50Z: rustc_mir::transform::elaborate_drops: find_dead_unwinds(src/lib.rs:128:1: 182:2)
DEBUG 2018-04-24T15:07:09Z: rustc_mir::transform::elaborate_drops: drop_flags_for_locs(BasicBlockData { statements: [StorageLive(_2), nop, StorageLive(_3), _3 = [const 0u64; 5], _2 = [move _3; 24], StorageDead(_3), StorageLive(_5), StorageLive(_6), StorageLive(_7), _7 = core::ops::Range { start: const 0usize, end: const 6usize }], terminator: Some(Terminator { source_info: SourceInfo { span: src/lib.rs:104:21: 104:25, scope: scope[1] }, kind: _6 = const core::iter::IntoIterator::into_iter(move _7) -> bb1 }), is_cleanup: false })

Note that const_prop does not change the MIR (it just emits lints), so there's no way it can affect anything.

pietroalbini · 2018-05-01T08:09:27Z

Ping @oli-obk! We're approaching the release of 1.26, do we want to get this regression fixed?

oli-obk · 2018-05-01T09:13:50Z

I'll see if I find some time. no promises

oli-obk · 2018-07-30T08:03:47Z

cc @wesleywiser you expressed interest in looking at const eval perf issues

oli-obk · 2018-10-26T09:23:04Z

We've done major perf improvements in this area. I don't really know if this issue is actionable anymore after all this time.

newpavlov · 2018-10-29T03:43:21Z

There is still a small ~20% regression:

$ cargo clean; cargo +1.21.0 build
   Compiling keccak v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 11.15 secs
$ cargo clean; cargo +stable build
   Compiling keccak v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 14.44s 
$ cargo clean; cargo +nightly build
   Compiling keccak v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 14.55s

But I guess it's small enough for this issue to be closed.

ishitatsuyuki added I-compiletime Issue: Problems and improvements with respect to compile times. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Mar 21, 2018

Mark-Simulacrum added this to the 1.26 milestone Mar 21, 2018

oli-obk mentioned this issue Mar 22, 2018

Remove slow HashSet during miri stack frame creation #49274

Merged

rust-highfive added the P-high High priority label Mar 22, 2018

nikomatsakis assigned oli-obk Mar 22, 2018

nikomatsakis mentioned this issue Mar 22, 2018

add keccak test rust-lang/rustc-perf#196

Closed

bors closed this as completed in #49274 Mar 24, 2018

newpavlov mentioned this issue Mar 27, 2018

Use f1600 from keccak crate debris/tiny-keccak#25

Closed

oli-obk removed the P-high High priority label Mar 28, 2018

oli-obk reopened this Mar 28, 2018

oli-obk added P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Apr 12, 2018

jkordish added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 25, 2018

eddyb added the I-nominated label Apr 26, 2018

pietroalbini removed the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Apr 30, 2018

nikomatsakis removed the I-nominated label May 3, 2018

alexcrichton removed this from the 1.26 milestone May 7, 2018

alexcrichton added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels May 7, 2018

oli-obk added the A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) label Jun 30, 2018

oli-obk closed this as completed Oct 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression of compilation times on the latest nightly (2018-03-20) #49237

Regression of compilation times on the latest nightly (2018-03-20) #49237

newpavlov commented Mar 21, 2018 •

edited

Loading

Mark-Simulacrum commented Mar 21, 2018

Zoxc commented Mar 21, 2018

oli-obk commented Mar 22, 2018

michaelwoerister commented Mar 22, 2018

nikomatsakis commented Mar 22, 2018

nikomatsakis commented Mar 22, 2018

oli-obk commented Mar 22, 2018

newpavlov commented Mar 22, 2018

newpavlov commented Mar 27, 2018 •

edited

Loading

oli-obk commented Mar 28, 2018

newpavlov commented Mar 28, 2018

pnkfelix commented Apr 12, 2018

oli-obk commented Apr 24, 2018

pietroalbini commented May 1, 2018

oli-obk commented May 1, 2018

oli-obk commented Jul 30, 2018

oli-obk commented Oct 26, 2018

newpavlov commented Oct 29, 2018 •

edited

Loading

Regression of compilation times on the latest nightly (2018-03-20) #49237

Regression of compilation times on the latest nightly (2018-03-20) #49237

Comments

newpavlov commented Mar 21, 2018 • edited Loading

Mark-Simulacrum commented Mar 21, 2018

Zoxc commented Mar 21, 2018

oli-obk commented Mar 22, 2018

michaelwoerister commented Mar 22, 2018

nikomatsakis commented Mar 22, 2018

nikomatsakis commented Mar 22, 2018

oli-obk commented Mar 22, 2018

newpavlov commented Mar 22, 2018

newpavlov commented Mar 27, 2018 • edited Loading

oli-obk commented Mar 28, 2018

newpavlov commented Mar 28, 2018

pnkfelix commented Apr 12, 2018

oli-obk commented Apr 24, 2018

pietroalbini commented May 1, 2018

oli-obk commented May 1, 2018

oli-obk commented Jul 30, 2018

oli-obk commented Oct 26, 2018

newpavlov commented Oct 29, 2018 • edited Loading

newpavlov commented Mar 21, 2018 •

edited

Loading

newpavlov commented Mar 27, 2018 •

edited

Loading

newpavlov commented Oct 29, 2018 •

edited

Loading