Refactored `Chain` and `CollectiveMonoid` benchmarks #4264

TonioGela · 2022-07-02T17:38:01Z

Hello 👋 !

As the title states, I refactored these two benchmarks, moving them out of scala-2.12 to test Chain vs Vector in more Scala versions so that the benchmarks can be more "recent" than before.

I replaced the old fs2.Catenable with fs2.Chunk and removed the spire-math Chain tests (since I was unable to find a 2.13.x or 3.x version)

I have been able to run locally the CollectiveMonoid test in the 3 scala versions to gather some data:

Testing setup

It might be worth saying that I had to turn on the air conditioning system and rub the laptop with an ice pack to avoid thermal throttling, so do not take these numbers too seriously, please.

# JMH version: 1.32
# VM version: JDK 11.0.11, Java HotSpot(TM) 64-Bit Server VM, 11.0.11+9-LTS-194
# VM invoker: /Library/Java/JavaVirtualMachines/jdk-11.0.11.jdk/Contents/Home/bin/java
# VM options: -Xmx3G
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time

Benchmark on 2.12.16                     Mode  Cnt   Score   Error  Units
CollectionMonoidBench.accumulateChain   thrpt   25  58.961 ± 0.719  ops/s
CollectionMonoidBench.accumulateList    thrpt   25  31.188 ± 0.394  ops/s
CollectionMonoidBench.accumulateVector  thrpt   25   6.978 ± 0.185  ops/s

Benchmark on 2.13.8                      Mode  Cnt   Score   Error  Units
CollectionMonoidBench.accumulateChain   thrpt   25  81.973 ± 3.921  ops/s
CollectionMonoidBench.accumulateList    thrpt   25  21.150 ± 1.756  ops/s
CollectionMonoidBench.accumulateVector  thrpt   25  11.725 ± 0.306  ops/s

Benchmark on 3.1.3                       Mode  Cnt   Score   Error  Units
CollectionMonoidBench.accumulateChain   thrpt   25  85.870 ± 1.075  ops/s
CollectionMonoidBench.accumulateList    thrpt   25  20.250 ± 1.995  ops/s
CollectionMonoidBench.accumulateVector  thrpt   25  11.105 ± 0.099  ops/s

These numbers result from testing on my machine and should not be mindlessly generalized. It might be worth using some perf test infrastructure rather than my or any other local machine.

armanbilge · 2022-07-02T17:41:05Z

build.sbt

+    libraryDependencies ++= Seq(
+      "org.scalaz" %% "scalaz-core" % "7.3.6",
+      "co.fs2" %% "fs2-core" % "3.2.8"
+    ),


Thanks for all your work on this! Actually we should just drop Chunk from the benchmark because it creates a circular dependency with fs2. See #4194 (comment).

Hmm, I'll remove it so, thanks for noticing it.

Removed it in c3ec714

Now the ETA for the jmh run of CollectiveMonoid is just 04:18:20 for each scala version 😂

That means that I need 13 hours of ice packs 😇

armanbilge · 2022-07-02T17:42:03Z

Wow, this is really cool, so now we can compare across Scala versions!

TonioGela · 2022-07-02T17:44:27Z

Wow, this is really cool, so now we can compare across Scala versions!

Yep. Another thing that it's worth doing once these tests are executed on a reliable machine is updating the documentation of Chain since the performance section claims many things that were true for scala 2.12 but might not be for later versions.

armanbilge

Thanks, long overdue!

TonioGela · 2022-07-02T22:39:19Z

Adding CollectiveMonoid benchmarks on 2.13.8 (again, on my laptop, so handle them with care)

Benchmark on 2.13.8                     Mode  Cnt          Score         Error  Units
ChainBench.consLargeChain              thrpt   25  143759156.264 ± 5611584.788  ops/s
ChainBench.consLargeList               thrpt   25  148512687.273 ± 5992793.489  ops/s
ChainBench.consLargeVector             thrpt   25    7249505.257 ±  202436.549  ops/s
ChainBench.consSmallChain              thrpt   25  119925876.637 ± 1663011.363  ops/s
ChainBench.consSmallList               thrpt   25  152664330.695 ± 1828399.646  ops/s
ChainBench.consSmallVector             thrpt   25   57686442.030 ±  533768.670  ops/s
ChainBench.createChainOption           thrpt   25  167191685.222 ± 1474976.197  ops/s
ChainBench.createChainSeqOption        thrpt   25   21264365.364 ±  372757.348  ops/s
ChainBench.createSmallChain            thrpt   25   87260308.052 ±  960407.889  ops/s
ChainBench.createSmallList             thrpt   25   20000981.857 ±  396001.340  ops/s
ChainBench.createSmallVector           thrpt   25   26311376.712 ±  288871.258  ops/s
ChainBench.createTinyChain             thrpt   25   75311482.869 ± 1066466.694  ops/s
ChainBench.createTinyList              thrpt   25   67502351.990 ± 1071560.419  ops/s
ChainBench.createTinyVector            thrpt   25   39676430.380 ±  405717.649  ops/s
ChainBench.foldLeftLargeChain          thrpt   25        117.866 ±       3.343  ops/s
ChainBench.foldLeftLargeList           thrpt   25        193.640 ±       2.298  ops/s
ChainBench.foldLeftLargeVector         thrpt   25        178.370 ±       0.830  ops/s
ChainBench.foldLeftSmallChain          thrpt   25   43732934.777 ±  362285.965  ops/s
ChainBench.foldLeftSmallList           thrpt   25   51155941.055 ±  882005.961  ops/s
ChainBench.foldLeftSmallVector         thrpt   25   41902918.940 ±   53030.742  ops/s
ChainBench.lengthLargeChain            thrpt   25     131831.918 ±    1613.341  ops/s
ChainBench.lengthLargeList             thrpt   25        271.015 ±       0.962  ops/s
ChainBench.mapLargeChain               thrpt   25         78.162 ±       2.620  ops/s
ChainBench.mapLargeList                thrpt   25         73.676 ±       8.999  ops/s
ChainBench.mapLargeVector              thrpt   25        132.443 ±       2.360  ops/s
ChainBench.mapSmallChain               thrpt   25   24047623.583 ± 1834073.508  ops/s
ChainBench.mapSmallList                thrpt   25   21482014.328 ±  387854.819  ops/s
ChainBench.mapSmallVector              thrpt   25   34707281.383 ±  382477.558  ops/s
ChainBench.reverseLargeChain           thrpt   25      37700.549 ±     154.942  ops/s
ChainBench.reverseLargeList            thrpt   25        142.832 ±       3.626  ops/s

johnynek · 2022-07-03T00:22:05Z

so my read is that Chain offers significant wins over List and Vector for concatenation heavy workloads (the accumulate benchmark) and that it is nearly as fast or faster than List for other cases.

I think this is inline with our use cases: Chain is for fast concatenations.

TonioGela · 2022-07-03T10:03:04Z

so my read is that Chain offers significant wins over List and Vector for concatenation heavy workloads (the accumulate benchmark) and that it is nearly as fast or faster than List for other cases.

I think this is inline with our use cases: Chain is for fast concatenations.

I think so. I'll try to shave some time to produce the CollectiveMonoid benchmarks for 2.12.x and 3.x to see some progression. One gathered that results may offer some material to eventually refrase the claims in the Chain doc page.

TonioGela · 2022-07-04T06:58:36Z

Benchmark on 3.1.3                Mode  Cnt          Score         Error  Units
ChainBench.consLargeChain        thrpt   25  155429106.376 ± 1695815.631  ops/s
ChainBench.consLargeList         thrpt   25  198151233.407 ± 3025478.884  ops/s
ChainBench.consLargeVector       thrpt   25    8944880.002 ±  114086.632  ops/s
ChainBench.consSmallChain        thrpt   25  140879917.607 ± 2475505.001  ops/s
ChainBench.consSmallList         thrpt   25  181978361.347 ± 3790526.428  ops/s
ChainBench.consSmallVector       thrpt   25   70513884.827 ±  616969.321  ops/s
ChainBench.createChainOption     thrpt   25  191794350.747 ± 3650538.355  ops/s
ChainBench.createChainSeqOption  thrpt   25   26429928.762 ±  178155.996  ops/s
ChainBench.createSmallChain      thrpt   25  109353998.594 ±  831171.464  ops/s
ChainBench.createSmallList       thrpt   25   24844040.176 ±  383172.135  ops/s
ChainBench.createSmallVector     thrpt   25   32791640.677 ±  227328.900  ops/s
ChainBench.createTinyChain       thrpt   25   85981419.271 ± 1196171.479  ops/s
ChainBench.createTinyList        thrpt   25   80807041.737 ± 2977727.401  ops/s
ChainBench.createTinyVector      thrpt   25   47363196.235 ± 1025465.154  ops/s
ChainBench.foldLeftLargeChain    thrpt   25        138.155 ±       4.069  ops/s
ChainBench.foldLeftLargeList     thrpt   25        193.560 ±      15.635  ops/s
ChainBench.foldLeftLargeVector   thrpt   25        190.837 ±       6.019  ops/s
ChainBench.foldLeftSmallChain    thrpt   25   46903312.215 ± 1871730.186  ops/s
ChainBench.foldLeftSmallList     thrpt   25   55035931.971 ± 1855894.368  ops/s
ChainBench.foldLeftSmallVector   thrpt   25   44809039.728 ± 1983158.031  ops/s
ChainBench.lengthLargeChain      thrpt   25     168919.562 ±    2094.673  ops/s
ChainBench.lengthLargeList       thrpt   25        268.747 ±       1.536  ops/s
ChainBench.mapLargeChain         thrpt   25         91.314 ±       4.059  ops/s
ChainBench.mapLargeList          thrpt   25         77.763 ±      10.822  ops/s
ChainBench.mapLargeVector        thrpt   25        145.901 ±       1.334  ops/s
ChainBench.mapSmallChain         thrpt   25   25573791.757 ±  499976.167  ops/s
ChainBench.mapSmallList          thrpt   25   23811925.333 ±  615177.743  ops/s
ChainBench.mapSmallVector        thrpt   25   34554607.407 ± 1251126.047  ops/s
ChainBench.reverseLargeChain     thrpt   25      42697.211 ±     552.743  ops/s
ChainBench.reverseLargeList      thrpt   25        163.085 ±      10.072  ops/s

TonioGela · 2022-07-06T06:42:01Z

Last bit of "On my laptop™ benchmarks"

Benchmark on 2.12.16              Mode  Cnt          Score         Error  Units
ChainBench.consLargeChain        thrpt   25  152849367.773 ±  488485.527  ops/s
ChainBench.consLargeList         thrpt   25  218538256.786 ± 2090405.152  ops/s
ChainBench.consLargeVector       thrpt   25   11649376.721 ±   28401.013  ops/s
ChainBench.consSmallChain        thrpt   25  153729311.166 ±  215260.867  ops/s
ChainBench.consSmallList         thrpt   25  218225002.257 ± 3544400.759  ops/s
ChainBench.consSmallVector       thrpt   25   19349884.435 ±  161260.488  ops/s
ChainBench.createChainOption     thrpt   25  194648146.770 ± 3243862.858  ops/s
ChainBench.createChainSeqOption  thrpt   25   99237183.822 ±  959962.398  ops/s
ChainBench.createSmallChain      thrpt   25    6587813.412 ±   61191.050  ops/s
ChainBench.createSmallList       thrpt   25   19956990.399 ±  158023.941  ops/s
ChainBench.createSmallVector     thrpt   25   11276908.474 ±  305165.720  ops/s
ChainBench.createTinyChain       thrpt   25   19365000.329 ±  206773.643  ops/s
ChainBench.createTinyList        thrpt   25   91066517.051 ±  593975.876  ops/s
ChainBench.createTinyVector      thrpt   25   14686717.933 ±  288218.891  ops/s
ChainBench.foldLeftLargeChain    thrpt   25        126.876 ±       3.466  ops/s
ChainBench.foldLeftLargeList     thrpt   25        216.844 ±      12.993  ops/s
ChainBench.foldLeftLargeVector   thrpt   25        112.668 ±       0.524  ops/s
ChainBench.foldLeftSmallChain    thrpt   25   31543720.678 ±  194344.907  ops/s
ChainBench.foldLeftSmallList     thrpt   25   66607360.767 ±  123203.198  ops/s
ChainBench.foldLeftSmallVector   thrpt   25   14831499.201 ±  135245.656  ops/s
ChainBench.lengthLargeChain      thrpt   25     174937.752 ±    1612.051  ops/s
ChainBench.lengthLargeList       thrpt   25        287.508 ±       0.681  ops/s
ChainBench.mapLargeChain         thrpt   25         78.332 ±       1.475  ops/s
ChainBench.mapLargeList          thrpt   25         82.481 ±       4.213  ops/s
ChainBench.mapLargeVector        thrpt   25        103.491 ±       1.467  ops/s
ChainBench.mapSmallChain         thrpt   25   12234780.964 ±  217992.491  ops/s
ChainBench.mapSmallList          thrpt   25   22872891.991 ±  206212.273  ops/s
ChainBench.mapSmallVector        thrpt   25    9529464.560 ± 2075968.595  ops/s
ChainBench.reverseLargeChain     thrpt   25      46659.257 ±     275.119  ops/s
ChainBench.reverseLargeList      thrpt   25        156.478 ±      11.675  ops/s

johnynek · 2022-08-26T20:26:52Z

@key-eugene you made a comment regarding the claim that vector isn't slower than list or chain in 2.13. I thought I would point out these benchmarks to you. Please update them if you think they do not reflect the real story.

Refactored Chain and CollectiveMonoid benchmarks

76b382e

armanbilge added the behind-the-scenes appreciated, but not user-facing label Jul 2, 2022

armanbilge changed the title ~~Refactored Chain and CollectiveMonoid benchmarks~~ Refactored Chain and CollectiveMonoid benchmarks Jul 2, 2022

armanbilge reviewed Jul 2, 2022

View reviewed changes

Removed fs2 dep and Chunk from bench tests

c3ec714

armanbilge approved these changes Jul 2, 2022

View reviewed changes

johnynek approved these changes Jul 5, 2022

View reviewed changes

armanbilge merged commit 15b4d33 into typelevel:main Jul 5, 2022

TonioGela deleted the bench_refactor branch July 5, 2022 20:06

armanbilge mentioned this pull request Oct 2, 2022

RFC: A better data structure for Headers http4s/http4s#6720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored `Chain` and `CollectiveMonoid` benchmarks #4264

Refactored `Chain` and `CollectiveMonoid` benchmarks #4264

TonioGela commented Jul 2, 2022 •

edited

Loading

armanbilge Jul 2, 2022

TonioGela Jul 2, 2022

TonioGela Jul 2, 2022

TonioGela Jul 2, 2022 •

edited

Loading

armanbilge commented Jul 2, 2022

TonioGela commented Jul 2, 2022 •

edited

Loading

armanbilge left a comment

TonioGela commented Jul 2, 2022

johnynek commented Jul 3, 2022

TonioGela commented Jul 3, 2022

TonioGela commented Jul 4, 2022

TonioGela commented Jul 6, 2022

johnynek commented Aug 26, 2022

Refactored Chain and CollectiveMonoid benchmarks #4264

Refactored Chain and CollectiveMonoid benchmarks #4264

Conversation

TonioGela commented Jul 2, 2022 • edited Loading

armanbilge Jul 2, 2022

Choose a reason for hiding this comment

TonioGela Jul 2, 2022

Choose a reason for hiding this comment

TonioGela Jul 2, 2022

Choose a reason for hiding this comment

TonioGela Jul 2, 2022 • edited Loading

Choose a reason for hiding this comment

armanbilge commented Jul 2, 2022

TonioGela commented Jul 2, 2022 • edited Loading

armanbilge left a comment

Choose a reason for hiding this comment

TonioGela commented Jul 2, 2022

johnynek commented Jul 3, 2022

TonioGela commented Jul 3, 2022

TonioGela commented Jul 4, 2022

TonioGela commented Jul 6, 2022

johnynek commented Aug 26, 2022

Refactored `Chain` and `CollectiveMonoid` benchmarks #4264

Refactored `Chain` and `CollectiveMonoid` benchmarks #4264

TonioGela commented Jul 2, 2022 •

edited

Loading

TonioGela Jul 2, 2022 •

edited

Loading

TonioGela commented Jul 2, 2022 •

edited

Loading