Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression of compilation times on the latest nightly (2018-03-20) #49237

Closed
newpavlov opened this issue Mar 21, 2018 · 18 comments · Fixed by #49274
Closed

Regression of compilation times on the latest nightly (2018-03-20) #49237

newpavlov opened this issue Mar 21, 2018 · 18 comments · Fixed by #49274
Assignees
Labels
A-const-eval Area: Constant evaluation (MIR interpretation) A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@newpavlov
Copy link
Contributor

newpavlov commented Mar 21, 2018

For keccak crate I use simple loop unrolling macros:

macro_rules! unroll5 {
    ($var:ident, $body:block) => {
        { const $var: usize = 0; $body; }
        { const $var: usize = 1; $body; }
        { const $var: usize = 2; $body; }
        { const $var: usize = 3; $body; }
        { const $var: usize = 4; $body; }
    };
}

Which combined generate a lot of unrolled code as a final result. On the latest Nightly it takes a lot more time to compile the crate than previously:

$ cargo clean; rustup run stable cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 3.9 secs
cargo clean; rustup run stable cargo check
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 2.30 secs
$ cargo clean; rustup run nightly cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 18.65 secs
$ cargo clean; rustup run nightly cargo check
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 2.32 secs

Here I've used only 10 iterations instead of 24 in the unroll24 macro, full version takes more than several minutes to compile. Judging by cargo check warnings expansion takes approximately the same time and the drastic difference comes from later stages.

EDIT: RUSTFLAGS="-Z time-passes" rustup run nightly cargo build produces the following result:

Click to expand
  time: 0.001; rss: 48MB	parsing
  time: 0.000; rss: 50MB	garbage collect incremental cache directory
  time: 0.000; rss: 50MB	recursion limit
  time: 0.000; rss: 50MB	crate injection
  time: 0.000; rss: 50MB	plugin loading
  time: 0.000; rss: 50MB	plugin registration
  time: 0.000; rss: 50MB	background load prev dep-graph
  time: 0.059; rss: 71MB	expansion
  time: 0.000; rss: 71MB	maybe building test harness
  time: 0.001; rss: 71MB	maybe creating a macro crate
  time: 0.003; rss: 71MB	creating allocators
  time: 0.002; rss: 71MB	AST validation
  time: 0.013; rss: 74MB	name resolution
  time: 0.002; rss: 74MB	complete gated feature checking
  time: 0.000; rss: 74MB	blocked while dep-graph loading finishes
  time: 0.016; rss: 80MB	lowering ast -> hir
  time: 0.008; rss: 80MB	early lint checks
  time: 0.021; rss: 83MB	indexing hir
  time: 0.000; rss: 79MB	load query result cache
  time: 0.000; rss: 79MB	looking for entry point
  time: 0.000; rss: 79MB	looking for plugin registrar
  time: 0.001; rss: 79MB	loop checking
  time: 0.000; rss: 81MB	attribute checking
  time: 0.006; rss: 84MB	stability checking
  time: 0.012; rss: 88MB	type collecting
  time: 0.000; rss: 88MB	outlives testing
  time: 0.000; rss: 88MB	impl wf inference
  time: 0.000; rss: 88MB	coherence checking
  time: 0.000; rss: 88MB	variance testing
  time: 0.046; rss: 106MB	wf checking
  time: 0.023; rss: 109MB	item-types checking
  time: 1.518; rss: 121MB	item-bodies checking
  time: 0.031; rss: 122MB	rvalue promotion
  time: 0.014; rss: 122MB	privacy checking
  time: 0.002; rss: 122MB	intrinsic checking
  time: 0.006; rss: 122MB	match checking
  time: 0.084; rss: 118MB	liveness checking
  time: 0.324; rss: 134MB	borrow checking
  time: 0.002; rss: 135MB	MIR borrow checking
  time: 0.000; rss: 135MB	MIR effect checking
  time: 0.003; rss: 135MB	death checking
  time: 0.000; rss: 135MB	unused lib feature checking
  time: 0.039; rss: 138MB	lint checking
  time: 0.000; rss: 138MB	dumping chalk-like clauses
  time: 0.000; rss: 138MB	resolving dependency formats
    time: 0.037; rss: 140MB	write metadata
    time: 14.945; rss: 143MB	translation item collection
    time: 0.000; rss: 143MB	codegen unit partitioning
    time: 0.142; rss: 151MB	translate to LLVM IR
    time: 0.000; rss: 151MB	assert dep graph
    time: 0.000; rss: 152MB	llvm function passes [2lyh15q6cjwzy18c]
    time: 0.000; rss: 152MB	llvm module passes [2lyh15q6cjwzy18c]
    time: 0.002; rss: 159MB	codegen passes [2lyh15q6cjwzy18c]
      time: 0.024; rss: 160MB	persist query result cache
    time: 0.030; rss: 162MB	llvm function passes [30rksvufw6ddw8se]
    time: 0.004; rss: 163MB	llvm module passes [30rksvufw6ddw8se]
      time: 0.011; rss: 161MB	persist dep-graph
    time: 0.034; rss: 161MB	serialize dep graph
  time: 15.160; rss: 161MB	translation
    time: 0.526; rss: 117MB	codegen passes [30rksvufw6ddw8se]
  time: 0.645; rss: 108MB	LLVM passes
  time: 0.000; rss: 108MB	serialize work products
  time: 0.001; rss: 108MB	linking
@ishitatsuyuki ishitatsuyuki added I-compiletime Issue: Problems and improvements with respect to compile times. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Mar 21, 2018
@Mark-Simulacrum Mark-Simulacrum added this to the 1.26 milestone Mar 21, 2018
@Mark-Simulacrum
Copy link
Member

cc @rust-lang/compiler -- can we assign someone to investigate?

@Zoxc
Copy link
Contributor

Zoxc commented Mar 21, 2018

@oli-obk Could this be Miri?

@oli-obk
Copy link
Contributor

oli-obk commented Mar 22, 2018

Yes this is very likely to be miri related.

It has a lot of constants if I understand it correctly. It also looks like the collector and trans take up most of the time. This makes sense as we now collect all constants, which we didn't before. These private constants shouldn't be collected though.

@michaelwoerister
Copy link
Member

cc @eddyb

@nikomatsakis
Copy link
Contributor

triage: P-high

@rust-highfive rust-highfive added the P-high High priority label Mar 22, 2018
@nikomatsakis
Copy link
Contributor

Looks like the fearless @oli-obk is on it

@oli-obk
Copy link
Contributor

oli-obk commented Mar 22, 2018

The keccak crate is a great benchmark for ridiculous numbers of locals and basic blocks. We should add it to the list of perf tests.

Anyways, the fix works locally

@newpavlov
Copy link
Contributor Author

On nightly replacing { const $var: usize = N; $body; } with $var = N; $body; reduces compilation time a bit, but not that much, but on the stable it's a vice versa.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 23, 2018
…er,eddyb

Remove slow HashSet during miri stack frame creation

fixes rust-lang#49237

probably has a major impact on rust-lang#48846

r? @michaelwoerister

cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.
kennytm added a commit to kennytm/rust that referenced this issue Mar 24, 2018
…er,eddyb

Remove slow HashSet during miri stack frame creation

fixes rust-lang#49237

probably has a major impact on rust-lang#48846

r? @michaelwoerister

cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.
kennytm added a commit to kennytm/rust that referenced this issue Mar 24, 2018
…er,eddyb

Remove slow HashSet during miri stack frame creation

fixes rust-lang#49237

probably has a major impact on rust-lang#48846

r? @michaelwoerister

cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.
@newpavlov
Copy link
Contributor Author

newpavlov commented Mar 27, 2018

On 2018-03-26 nightly compilation times become better, but still ~4 times slower compared to 1.24 stable:

$ cargo clean ; rustup run stable cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 12.34 secs
$ cargo clean ; rustup run nightly cargo build
   Compiling keccak v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 46.81 secs

@oli-obk
Copy link
Contributor

oli-obk commented Mar 28, 2018

I'll investigate. I removed P-high, because while a 4x regression is bad, manually unrolling loops is a corner case (2k unrolled loop bodies in a single function is a lot of copies of the same code!)

@newpavlov
Copy link
Contributor Author

Note that I've changed the code, now it does not unroll the main loop (thus it generates 24 times less code), to test the old version see rev a417e97.

@pnkfelix
Copy link
Member

triage: P-medium

downgrading to P-medium since oli-obk is investigating and we do not feel need to revisit at the weekly compiler team meeting.

@oli-obk oli-obk added P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Apr 12, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Apr 24, 2018

So... the regressions visible in -Ztime-passes are:

  • item-bodies checking (7 -> 24)
  • borrow-checking (1 -> 4)
  • write metadata (0.1 -> 40)
  • translation item collection (2 -> 40)
  • translate to llvm (0.5 -> 2.3)
  • translation (2.5 -> 44)
  • llvm passes (1 -> 2)

Some info:

  • per unrolled loop we get ~350 const_eval calls
  • the function (when unrolling 4x) has 12k MIR locals
  • const propagation takes "only" 3 seconds, so it can't be the reason for the long compile times
  • the logs show a few gaps at 4x unroll:

TRACE 2018-04-24T15:06:24Z: rustc_mir::interpret::memory: mark_static_initialized AllocId(8), mutability: Immutable
DEBUG 2018-04-24T15:06:41Z: rustc_mir::transform::check_unsafety: unsafety_violations(DefId(0/0:5 ~ keccak[1670]::RHO[0]))

DEBUG 2018-04-24T15:06:42Z: rustc_mir::transform::simplify: simplifying bb3754
DEBUG 2018-04-24T15:06:50Z: rustc_mir::transform::check_unsafety: check_unsafety(DefId(0/0:4 ~ keccak[1670]::PLEN[0]))

DEBUG 2018-04-24T15:06:50Z: rustc_mir::transform::elaborate_drops: elaborate_drops(MirSource { def_id: DefId(0/0:10 ~ keccak[1670]::f1600[0]), promoted: None } @ src/lib.rs:128:1: 182:2)
DEBUG 2018-04-24T15:06:50Z: rustc_mir::transform::elaborate_drops: find_dead_unwinds(src/lib.rs:128:1: 182:2)
DEBUG 2018-04-24T15:07:09Z: rustc_mir::transform::elaborate_drops: drop_flags_for_locs(BasicBlockData { statements: [StorageLive(_2), nop, StorageLive(_3), _3 = [const 0u64; 5], _2 = [move _3; 24], StorageDead(_3), StorageLive(_5), StorageLive(_6), StorageLive(_7), _7 = core::ops::Range { start: const 0usize, end: const 6usize }], terminator: Some(Terminator { source_info: SourceInfo { span: src/lib.rs:104:21: 104:25, scope: scope[1] }, kind: _6 = const core::iter::IntoIterator::into_iter(move _7) -> bb1 }), is_cleanup: false })

Note that const_prop does not change the MIR (it just emits lints), so there's no way it can affect anything.

@jkordish jkordish added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 25, 2018
@pietroalbini pietroalbini removed the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Apr 30, 2018
@pietroalbini
Copy link
Member

Ping @oli-obk! We're approaching the release of 1.26, do we want to get this regression fixed?

@oli-obk
Copy link
Contributor

oli-obk commented May 1, 2018

I'll see if I find some time. no promises

@alexcrichton alexcrichton removed this from the 1.26 milestone May 7, 2018
@alexcrichton alexcrichton added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels May 7, 2018
@oli-obk oli-obk added the A-const-eval Area: Constant evaluation (MIR interpretation) label Jun 30, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Jul 30, 2018

cc @wesleywiser you expressed interest in looking at const eval perf issues

@oli-obk
Copy link
Contributor

oli-obk commented Oct 26, 2018

We've done major perf improvements in this area. I don't really know if this issue is actionable anymore after all this time.

@newpavlov
Copy link
Contributor Author

newpavlov commented Oct 29, 2018

There is still a small ~20% regression:

$ cargo clean; cargo +1.21.0 build
   Compiling keccak v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 11.15 secs
$ cargo clean; cargo +stable build
   Compiling keccak v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 14.44s 
$ cargo clean; cargo +nightly build
   Compiling keccak v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 14.55s

But I guess it's small enough for this issue to be closed.

@oli-obk oli-obk closed this as completed Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation (MIR interpretation) A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.