ThinLTO + O2 build time regression in rust-doom #45188

alexcrichton · 2017-10-10T22:28:24Z

Creating an issue for https://internals.rust-lang.org/t/help-test-out-thinlto/6017/3 for more investigation.

michaelwoerister · 2017-10-17T10:07:38Z

We should definitely re-investigate this with different settings for -Z inline-in-all-cgus since I don't think that existed yet when the initial benchmark was done.

alexcrichton · 2017-10-17T17:28:02Z

Ok here's what I did:

this script
cristicbz/rust-doom@51e88df
rustc 1.22.0-nightly (4e9527cf6 2017-10-16)
8-core (4x cores + hyperthread) + SSD + 32GB RAM
Ubuntu Linux

Results:

Build entire project from scratch

Here I just ran cargo build from a completely clean target directory and timed how long it took.

cgus	1	2	4	8	16	32	64
debug #1	79.414	68.783	64.883	65.293	64.567	64.743	66.243
debug #2	80.465	70.181	65.977	64.997	65.252	65.869	66.912
release thinlto=no #1	175.212	131.287	117.496	114.854	114.018	114.566	114.391
release thinlto=no #2	171.829	131.514	119.942	115.897	112.881	113.379	114.711
release thinlto=yes #1	172.188	174.210	161.224	159.588	154.327	154.382	161.319
release thinlto=yes #2	171.179	172.778	160.880	156.812	157.029	156.233	156.742

The conclusion from this is that I get clear wins across the board on my machine for using more CGUs. The sweet spot appears around 16 CGUs, and I don't see any clear/obvious regressions in this area for using multiple CGUs.

Build project after changing the math crate

Here I changed the math crate to force a recompilation of that crate and up. All other dependencies were cached

cgus	1	2	4	8	16	32	64
debug #1	18.573	14.590	12.318	11.827	11.733	11.936	12.206
debug #2	18.558	14.533	12.246	11.904	11.659	11.991	12.250
release thinlto=no #1	59.602	35.658	23.906	22.403	21.920	21.862	21.932
release thinlto=no #2	60.016	35.438	24.806	22.935	21.835	21.914	21.902
release thinlto=yes #1	59.449	46.711	33.257	33.333	31.071	31.026	31.190
release thinlto=yes #2	60.491	47.160	33.320	33.021	31.036	30.964	30.347

Here we see even more drastic wins (as is expected) and again no clear regressions from one CGU.

@michaelwoerister

I haven't done much data collection of inline-in-all-cgus in release mode, but @sfackler posted some worrisome numbers which indicate that -Z inline-in-all-cgus=no actually regresses release build time. Presumably it shifts more work to post-LTO than the work takes in pre-LTO?

alexcrichton · 2017-10-17T17:38:07Z

In the thread about testing ThinLTO some main conclusions I had was:

Enabling multiple CGUs and ThinLTO requires the compiler to perform more work overall. In general there's just some more management, some synchronization, and especially handling of #[inline] in release mode.
It's not a 100% win for all projects. Some got over 2x faster to compile, some got a few dozen or so percent slower.
I've personally found locally that the biggest improvements come when compiling incrementally when there are fewer crates in play (more empty cores available)
I've personally never been able to reproduce any regressions on my machine, I've always seen wins across the board

With that in mind I'm sort of tempted to close this and keep pushing hard on #45320, but I'm curious if others can reproduce my own results!

cristicbz · 2017-10-17T18:08:58Z

I'm doing another run with the newest nightly, more values for the number of cgus and inline-in-all-cgus. I'll report back for posterity's sake.

I've personally never been able to reproduce any regressions on my machine, I've always seen wins across the board

I want that engraved on my tombstone :P

alexcrichton · 2017-10-17T18:44:23Z

Fair point!

cristicbz · 2017-10-17T20:59:53Z

So using your script (thanks btw!), this what I've got (with rustc 1.22.0-nightly (4e9527cf6 2017-10-16)):

config / cgus	1	2	4	8	16	32
fresh, debug #1	84.241	78.396	66.445	64.908	66.476	67.808
fresh, debug #2	84.017	70.373	66.309	66.053	65.734	68.415
fresh, release #1	164.202	135.773	129.870	122.565	114.855	116.007
fresh, release #2	163.273	143.971	121.138	124.187	113.137	115.537
fresh, release, thinlto #1	169.568	171.795	163.108	178.536	156.117	158.511
fresh, release, thinlto #2	165.670	176.277	162.612	173.914	155.648	165.749
math, debug #1	20.781	15.920	14.698	14.141	14.056	14.349
math, debug #2	20.737	15.901	14.717	14.056	14.101	14.400
math, release #1	64.861	37.807	25.958	25.829	23.400	23.247
math, release #2	64.856	37.211	25.930	26.070	23.292	23.264
math, release, thinlto #1	62.796	51.280	35.819	36.367	32.440	32.596
math, release, thinlto #2	71.395	50.464	36.009	35.762	32.796	33.840

This results match yours more closely than before, with modest or no improvements in the CGU 8-32 regime for the full rebuild and significant speed-ups in the single crate case. There is no difference in runtime benchmarks.

So what happened before? I'll put it down to thermal throttling: all runs took much longer, and maybe with more cores in use things got hotter. Sorry for the noise, it sounds like you can close this issue!

alexcrichton · 2017-10-17T21:02:41Z

Ok awesome! Thanks so much for gathering that data! In that case I'll...

alexcrichton added the A-codegen Area: Code generation label Oct 10, 2017

alexcrichton mentioned this issue Oct 16, 2017

Tracking issue for enabling multiple CGUs in release mode by default #45320

Closed

11 tasks

alexcrichton closed this as completed Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThinLTO + O2 build time regression in rust-doom #45188

ThinLTO + O2 build time regression in rust-doom #45188

alexcrichton commented Oct 10, 2017

michaelwoerister commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

cristicbz commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

cristicbz commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

ThinLTO + O2 build time regression in rust-doom #45188

ThinLTO + O2 build time regression in rust-doom #45188

Comments

alexcrichton commented Oct 10, 2017

michaelwoerister commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

cristicbz commented Oct 17, 2017

alexcrichton commented Oct 17, 2017

cristicbz commented Oct 17, 2017

alexcrichton commented Oct 17, 2017