use try_fold instead of try_for_each to reduce compile time #64885

andjo403 · 2019-09-28T22:11:13Z

as it was stated in #64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from #64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost

Centril · 2019-09-28T22:41:51Z

@bors try @rust-timer queue

rust-timer · 2019-09-28T22:41:53Z

Awaiting bors try build completion

bors · 2019-09-28T22:42:03Z

⌛ Trying commit fc87c00c5b527660779dbcea0fe4291177100616 with merge 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e...

bors · 2019-09-29T01:38:26Z

☀️ Try build successful - checks-azure
Build commit: 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e (4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e)

rust-timer · 2019-09-29T01:38:28Z

Queued 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e with parent 488381c, future comparison URL.

rust-timer · 2019-09-29T09:06:53Z

Finished benchmarking try commit 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e, comparison URL.

andjo403 · 2019-09-29T10:34:12Z

some procent better than only #64600 but still some way to go until the diffs from #64572 .
do not know how god it is to compare the procent changes for all the perf runs but as there is multiple bases for the PRs it is not possible to compare directly.

nnethercote · 2019-09-30T23:55:04Z

@andjo403: #64600 has now landed. Could you rebase and update the code here? I think only the first commit will be necessary now, and we can get a fair comparison. Thanks.

removes two functions to inline by combining the check functions and extra call to try_for_each

andjo403 · 2019-10-01T06:00:29Z

@nnethercote rebased and removed the second commit

nnethercote · 2019-10-01T06:36:43Z

Thanks!

@bors try @rust-timer queue

rust-timer · 2019-10-01T06:36:45Z

Awaiting bors try build completion

bors · 2019-10-01T06:36:56Z

⌛ Trying commit 8737061 with merge 40a3c41fdfde051926f256564c247e2ce94a667e...

bluss · 2019-10-01T09:24:53Z

Assuming we are using try_fold etc everywhere, we can still manually desugar to structs implementing FnMut instead of using closures.

Not the best abstraction level, but doesn't it look like we could save one generic item per iterator method then? Where we currently have the check functions.

bors · 2019-10-01T09:41:48Z

☀️ Try build successful - checks-azure
Build commit: 40a3c41fdfde051926f256564c247e2ce94a667e (40a3c41fdfde051926f256564c247e2ce94a667e)

bluss · 2019-10-01T19:03:33Z

src/libcore/iter/traits/iterator.rs

                if f(x) { LoopState::Continue(()) }
                else { LoopState::Break(()) }
            }
        }
-
-        self.try_for_each(check(f)) == LoopState::Continue(())
+        self.try_fold((), check(f)) == LoopState::Continue(())


Thoughts on equality check vs pattern matching here, can it have an effect or none at all?

made a quick diff in godbolt and there is less code to inline so that is something that I can do
ZN72$LT$example..LoopState$LT$C$C$B$GT$$u20$as$u20$core..cmp..PartialEq$GT$2eq17h37dbcaf2df999e09E is a lot to inline

I would hope it has no effect, since LoopState<(),()> is an i1 in LLVM...

...and it is in -O, but very different in debug: https://rust.godbolt.org/z/LKOpZ7

Looks like the PartialEq::eq that gets generated is pretty bad, and it's still bad removing the generics: https://rust.godbolt.org/z/o6Nuaw Could there be a "this is a field-less enum so just compare the discriminants" path in the derive? It looks, unfortunately, like as u8 == 1 is the shortest-emitted-IR way to do these checks. And we're avoiding the derives in other places too, like

rust/src/libcore/cmp.rs

Lines 632 to 638 in 702b45e

#[stable(feature = "rust1", since = "1.0.0")]

impl Ord for Ordering {

#[inline]

fn cmp(&self, other: &Ordering) -> Ordering {

(*self as i32).cmp(&(*other as i32))

}

}

Oh interesting. So we could improve here just by implementing PartialEq manually, or even adding a separate method for just discriminant comparison. But then pattern matching works well too. Like, just a method for ".is_continue()"

but the pattern match avoids having a function to inline completely as long as a function is used the llvm-ir will contain a call and a function

nnethercote · 2019-10-01T19:24:01Z

My rust-timer command above didn't work. Let's try doing it a different way:

@rust-timer build 40a3c41fdfde051926f256564c247e2ce94a667e

rust-timer · 2019-10-01T19:24:02Z

Queued 40a3c41fdfde051926f256564c247e2ce94a667e with parent 42ec683, future comparison URL.

andjo403 · 2019-10-01T19:33:49Z

Assuming we are using try_fold etc everywhere, we can still manually desugar to structs implementing FnMut instead of using closures.
Not the best abstraction level, but doesn't it look like we could save one generic item per iterator method then? Where we currently have the check functions.

I do not understand can you show some example?

bluss · 2019-10-01T20:18:01Z

@andjo403 I have an example of a before-after change like that, that I made as PoC. Rust -Zprint-mono-items=lazy tells me this uses 1 less generic function (Before we use check<T> and the closure in the check body, after we use only Fun::call_mut (call_once is never used).) Regrettably it's from a smaller similar iterator, not the exact code in libcore.

Code here https://gist.github.com/b94c565bc5ba37206112c150b8b1cc20

It doesn't look great - maybe a macro could improve that? In fact the code looks so bad, I'm unsure we'd want to do that. 🙂

andjo403 · 2019-10-01T20:30:40Z

thanks @bluss for the example and yes that code was hard to understand

bluss · 2019-10-01T20:52:24Z

It is equivalent to desugaring the original closure, without the "check(f) hack", but also without capturing extraneous type parameters. So a regular closure would be the same, when #46477 is fixed.

rust-timer · 2019-10-01T23:42:59Z

Finished benchmarking try commit 40a3c41fdfde051926f256564c247e2ce94a667e, comparison URL.

Mark-Simulacrum · 2019-10-02T00:00:17Z

Crazy bots! I think I know what's wrong though, will try and fix in a bit, and silence bot for now.

rust-timer · 2019-10-02T00:07:13Z

Finished benchmarking try commit 40a3c41fdfde051926f256564c247e2ce94a667e, comparison URL.

nnethercote · 2019-10-02T00:16:28Z

The results are good: up to 7.5% win for clap, and lots of sub-1% wins. Really good for such a simple change!

scottmcm · 2019-10-02T01:01:13Z

I'm happy with this as-is (we can explore other things like #64885 (comment) in a follow-up PR), so

@bors r+

bors · 2019-10-02T01:01:14Z

📌 Commit 8737061 has been approved by scottmcm

@nnethercote

use try_fold instead of try_for_each to reduce compile time as it was stated in rust-lang#64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold. as there is some gains with using the try_fold function this is maybe a way forward. when I tried to compile the clap-rs benchmark I get times gains only some % from rust-lang#64572 there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times can someone start a perf run? cc @nnethercote @scottmcm @bluss r? @ghost

@ghost

Rollup of 11 pull requests Successful merges: - #64649 (Avoid ICE on return outside of fn with literal array) - #64722 (Make all alt builders produce parallel-enabled compilers) - #64801 (Avoid `chain()` in `find_constraint_paths_between_regions()`.) - #64805 (Still more `ObligationForest` improvements.) - #64840 (SelfProfiler API refactoring and part one of event review) - #64885 (use try_fold instead of try_for_each to reduce compile time) - #64942 (Fix clippy warnings) - #64952 (Update cargo.) - #64974 (Fix zebra-striping in generic dataflow visualization) - #64978 (Fully clear `HandlerInner` in `Handler::reset_err_count`) - #64979 (Update books) Failed merges: - #64959 (syntax: improve parameter without type suggestions) r? @ghost

nnethercote mentioned this pull request Sep 30, 2019

Simplify some Iterator methods. #64572

Closed

replace try_for_each with try_fold to generate less code

8737061

removes two functions to inline by combining the check functions and extra call to try_for_each

andjo403 force-pushed the iter branch from fc87c00 to 8737061 Compare October 1, 2019 05:57

bluss reviewed Oct 1, 2019

View reviewed changes

rust-lang deleted a comment from rust-timer Oct 2, 2019

scottmcm self-assigned this Oct 2, 2019

scottmcm changed the title ~~[WIP] use try_fold instead of try_for_each to reduce compile time~~ use try_fold instead of try_for_each to reduce compile time Oct 2, 2019

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 2, 2019

tmandry mentioned this pull request Oct 2, 2019

Rollup of 11 pull requests #64981

Merged

bors merged commit 8737061 into rust-lang:master Oct 2, 2019

andjo403 deleted the iter branch October 2, 2019 10:01

the8472 mentioned this pull request Jan 24, 2020

perf: Use for_each in Vec::extend #68046

Closed

	#[stable(feature = "rust1", since = "1.0.0")]
	impl Ord for Ordering {
	#[inline]
	fn cmp(&self, other: &Ordering) -> Ordering {
	(self as i32).cmp(&(other as i32))
	}
	}

use try_fold instead of try_for_each to reduce compile time #64885

use try_fold instead of try_for_each to reduce compile time #64885

Uh oh!

Conversation

andjo403 commented Sep 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Centril commented Sep 28, 2019

Uh oh!

rust-timer commented Sep 28, 2019

Uh oh!

bors commented Sep 28, 2019

Uh oh!

bors commented Sep 29, 2019

Uh oh!

rust-timer commented Sep 29, 2019

Uh oh!

rust-timer commented Sep 29, 2019

Uh oh!

andjo403 commented Sep 29, 2019

Uh oh!

nnethercote commented Sep 30, 2019

Uh oh!

andjo403 commented Oct 1, 2019

Uh oh!

nnethercote commented Oct 1, 2019

Uh oh!

rust-timer commented Oct 1, 2019

Uh oh!

bors commented Oct 1, 2019

Uh oh!

bluss commented Oct 1, 2019

Uh oh!

bors commented Oct 1, 2019

Uh oh!

bluss Oct 1, 2019

Choose a reason for hiding this comment

Uh oh!

andjo403 Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottmcm Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluss Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andjo403 Oct 1, 2019

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Oct 1, 2019

Uh oh!

rust-timer commented Oct 1, 2019

Uh oh!

andjo403 commented Oct 1, 2019

Uh oh!

bluss commented Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andjo403 commented Oct 1, 2019

Uh oh!

bluss commented Oct 1, 2019

Uh oh!

rust-timer commented Oct 1, 2019

Uh oh!

Mark-Simulacrum commented Oct 2, 2019

Uh oh!

rust-timer commented Oct 2, 2019

Uh oh!

nnethercote commented Oct 2, 2019

Uh oh!

scottmcm commented Oct 2, 2019

Uh oh!

bors commented Oct 2, 2019

Uh oh!

Uh oh!

andjo403 commented Sep 28, 2019 •

edited

Loading

andjo403 Oct 1, 2019 •

edited

Loading

scottmcm Oct 1, 2019 •

edited

Loading

bluss Oct 1, 2019 •

edited

Loading

bluss commented Oct 1, 2019 •

edited

Loading