Replaced `traverse` benchmarks with something more representative #4403

djspiewak · 2023-02-27T21:59:32Z

Basically replacing all of the traverse benchmarks with something more representative and projecting it onto a more useful scale. This makes it possible to directly compare Traverse[List].traverse and List#map, for example, as well as ultimately experiment with different implementation strategies that potentially avoid extra Eval wrapping.

johnynek

seems pretty good to me.

Can you add some example results on your machine in a comment to this PR?

Also I asked a question about the use of throw. I wonder why not use a short-circuiting applicative such as EitherT[Eval, A, B] or something.

johnynek · 2023-03-05T20:41:36Z

bench/src/main/scala/cats/bench/TraverseBench.scala

-      case h :: t => G.map2Eval(f(h), Eval.defer(loop(t)))(_ :: _)
-      case Nil    => Eval.now(G.pure(Nil))
+        if (i == length * 0.3) {
+          throw Failure


I'm surprised to see us using throw in these benchmarks along with Eval. Can you add a comment since it is non-idiomatic to do so?

Eval secretly supports short-circuiting, which means it's a halfway-decent approximate for IO. Ultimately, I would be willing to bet that 90% of traverse usage in the wild has IO as the base applicative. In this case, I was just trying to caliper down on the short-circuiting scenario without adding extra overhead that would muddy the constant factors.

The other advantage to doing it this way is it allows us to compare like-for-like. If we write the same traverse process by hand (which I have now done locally in Cats Effect), we can compare directly and see the overhead of traverse itself, allowing us to explore different strategies.

danicheg · 2023-03-06T08:10:32Z

bench/src/main/scala/cats/bench/TraverseBench.scala

+  // the unit of CPU work per iteration
+  private[this] val Work: Long = 10


In terms of JMH, we are going to burn CPU time per invocation here. Isn't it?

Yep! Specifically, per iteration of the traverse itself.

danicheg · 2023-03-06T08:21:44Z

bench/src/main/scala/cats/bench/TraverseBench.scala

-      .value
+    }
+
+    bh.consume(result.value)


Just wondering, is consuming of result value a thing here?

In this case, it's almost certainly unnecessary, but in general this is a good thing to do since it prevents the combination of inlining and escape analysis from inadvertently eliding the code path you actually intended to measure, resulting in over-fitting.

djspiewak · 2023-03-06T17:42:15Z

Can you add some example results on your machine in a comment to this PR?

I'll plop some results in here tonight. I don't want to base on my regular machine because benchmarking using Apple Silicon is… very very deceptive. Once I'm back home I'll use another home desktop to run the tests.

djspiewak · 2023-03-07T03:06:31Z

[info] Benchmark                           (length)   Mode  Cnt     Score    Error  Units
[info] TraverseBench.filterList               10000  thrpt   10  7017.116 ± 69.322  ops/s
[info] TraverseBench.filterVector             10000  thrpt   10  7241.328 ± 47.800  ops/s
[info] TraverseBench.mapList                  10000  thrpt   10  7254.157 ± 38.800  ops/s
[info] TraverseBench.mapVector                10000  thrpt   10  7938.662 ± 16.415  ops/s
[info] TraverseBench.traverseFilterList       10000  thrpt   10  1394.029 ± 21.418  ops/s
[info] TraverseBench.traverseFilterVector     10000  thrpt   10  1464.137 ± 15.472  ops/s
[info] TraverseBench.traverseList             10000  thrpt   10  1337.406 ±  7.706  ops/s
[info] TraverseBench.traverseListError        10000  thrpt   10  3694.182 ± 16.848  ops/s
[info] TraverseBench.traverseVector           10000  thrpt   10  1491.909 ±  7.190  ops/s
[info] TraverseBench.traverseVectorError      10000  thrpt   10  4361.904 ± 20.878  ops/s

Replaced traverse benchmarks with something more representative

c4149db

johnynek reviewed Mar 5, 2023

View reviewed changes

danicheg reviewed Mar 6, 2023

View reviewed changes

djspiewak mentioned this pull request Mar 6, 2023

Improve traverse performance through increased specialization #4408

Open

TonioGela mentioned this pull request Jun 21, 2023

Making Map.unorderedTraverse stack safe #4463

Merged

armanbilge added the behind-the-scenes appreciated, but not user-facing label Jun 26, 2023

armanbilge approved these changes Jun 26, 2023

View reviewed changes

armanbilge merged commit 13d8a9d into typelevel:main Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaced `traverse` benchmarks with something more representative #4403

Replaced `traverse` benchmarks with something more representative #4403

djspiewak commented Feb 27, 2023

johnynek left a comment

johnynek Mar 5, 2023

djspiewak Mar 6, 2023

danicheg Mar 6, 2023

djspiewak Mar 6, 2023

danicheg Mar 6, 2023

djspiewak Mar 6, 2023

djspiewak commented Mar 6, 2023

djspiewak commented Mar 7, 2023

		// the unit of CPU work per iteration
		private[this] val Work: Long = 10

Replaced traverse benchmarks with something more representative #4403

Replaced traverse benchmarks with something more representative #4403

Conversation

djspiewak commented Feb 27, 2023

johnynek left a comment

Choose a reason for hiding this comment

johnynek Mar 5, 2023

Choose a reason for hiding this comment

djspiewak Mar 6, 2023

Choose a reason for hiding this comment

danicheg Mar 6, 2023

Choose a reason for hiding this comment

djspiewak Mar 6, 2023

Choose a reason for hiding this comment

danicheg Mar 6, 2023

Choose a reason for hiding this comment

djspiewak Mar 6, 2023

Choose a reason for hiding this comment

djspiewak commented Mar 6, 2023

djspiewak commented Mar 7, 2023

Replaced `traverse` benchmarks with something more representative #4403

Replaced `traverse` benchmarks with something more representative #4403