Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored Chain and CollectiveMonoid benchmarks #4264

Merged
merged 2 commits into from
Jul 5, 2022
Merged

Refactored Chain and CollectiveMonoid benchmarks #4264

merged 2 commits into from
Jul 5, 2022

Conversation

TonioGela
Copy link
Member

@TonioGela TonioGela commented Jul 2, 2022

Hello 👋 !

As the title states, I refactored these two benchmarks, moving them out of scala-2.12 to test Chain vs Vector in more Scala versions so that the benchmarks can be more "recent" than before.

I replaced the old fs2.Catenable with fs2.Chunk and removed the spire-math Chain tests (since I was unable to find a 2.13.x or 3.x version)

I have been able to run locally the CollectiveMonoid test in the 3 scala versions to gather some data:

Testing setup It might be worth saying that I had to turn on the air conditioning system and rub the laptop with an ice pack to avoid thermal throttling, so do not take these numbers too seriously, please.
# JMH version: 1.32
# VM version: JDK 11.0.11, Java HotSpot(TM) 64-Bit Server VM, 11.0.11+9-LTS-194
# VM invoker: /Library/Java/JavaVirtualMachines/jdk-11.0.11.jdk/Contents/Home/bin/java
# VM options: -Xmx3G
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
Benchmark on 2.12.16                     Mode  Cnt   Score   Error  Units
CollectionMonoidBench.accumulateChain   thrpt   25  58.961 ± 0.719  ops/s
CollectionMonoidBench.accumulateList    thrpt   25  31.188 ± 0.394  ops/s
CollectionMonoidBench.accumulateVector  thrpt   25   6.978 ± 0.185  ops/s

Benchmark on 2.13.8                      Mode  Cnt   Score   Error  Units
CollectionMonoidBench.accumulateChain   thrpt   25  81.973 ± 3.921  ops/s
CollectionMonoidBench.accumulateList    thrpt   25  21.150 ± 1.756  ops/s
CollectionMonoidBench.accumulateVector  thrpt   25  11.725 ± 0.306  ops/s

Benchmark on 3.1.3                       Mode  Cnt   Score   Error  Units
CollectionMonoidBench.accumulateChain   thrpt   25  85.870 ± 1.075  ops/s
CollectionMonoidBench.accumulateList    thrpt   25  20.250 ± 1.995  ops/s
CollectionMonoidBench.accumulateVector  thrpt   25  11.105 ± 0.099  ops/s

These numbers result from testing on my machine and should not be mindlessly generalized. It might be worth using some perf test infrastructure rather than my or any other local machine.

@armanbilge armanbilge added the behind-the-scenes appreciated, but not user-facing label Jul 2, 2022
@armanbilge armanbilge changed the title Refactored Chain and CollectiveMonoid benchmarks Refactored Chain and CollectiveMonoid benchmarks Jul 2, 2022
build.sbt Outdated
Comment on lines 268 to 271
libraryDependencies ++= Seq(
"org.scalaz" %% "scalaz-core" % "7.3.6",
"co.fs2" %% "fs2-core" % "3.2.8"
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this! Actually we should just drop Chunk from the benchmark because it creates a circular dependency with fs2. See #4194 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll remove it so, thanks for noticing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it in c3ec714

Now the ETA for the jmh run of CollectiveMonoid is just 04:18:20 for each scala version 😂

Copy link
Member Author

@TonioGela TonioGela Jul 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that I need 13 hours of ice packs 😇

@armanbilge
Copy link
Member

Wow, this is really cool, so now we can compare across Scala versions!

@TonioGela
Copy link
Member Author

TonioGela commented Jul 2, 2022

Wow, this is really cool, so now we can compare across Scala versions!

Yep. Another thing that it's worth doing once these tests are executed on a reliable machine is updating the documentation of Chain since the performance section claims many things that were true for scala 2.12 but might not be for later versions.

Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, long overdue!

@TonioGela
Copy link
Member Author

Adding CollectiveMonoid benchmarks on 2.13.8 (again, on my laptop, so handle them with care)

Benchmark on 2.13.8                     Mode  Cnt          Score         Error  Units
ChainBench.consLargeChain              thrpt   25  143759156.264 ± 5611584.788  ops/s
ChainBench.consLargeList               thrpt   25  148512687.273 ± 5992793.489  ops/s
ChainBench.consLargeVector             thrpt   25    7249505.257 ±  202436.549  ops/s
ChainBench.consSmallChain              thrpt   25  119925876.637 ± 1663011.363  ops/s
ChainBench.consSmallList               thrpt   25  152664330.695 ± 1828399.646  ops/s
ChainBench.consSmallVector             thrpt   25   57686442.030 ±  533768.670  ops/s
ChainBench.createChainOption           thrpt   25  167191685.222 ± 1474976.197  ops/s
ChainBench.createChainSeqOption        thrpt   25   21264365.364 ±  372757.348  ops/s
ChainBench.createSmallChain            thrpt   25   87260308.052 ±  960407.889  ops/s
ChainBench.createSmallList             thrpt   25   20000981.857 ±  396001.340  ops/s
ChainBench.createSmallVector           thrpt   25   26311376.712 ±  288871.258  ops/s
ChainBench.createTinyChain             thrpt   25   75311482.869 ± 1066466.694  ops/s
ChainBench.createTinyList              thrpt   25   67502351.990 ± 1071560.419  ops/s
ChainBench.createTinyVector            thrpt   25   39676430.380 ±  405717.649  ops/s
ChainBench.foldLeftLargeChain          thrpt   25        117.866 ±       3.343  ops/s
ChainBench.foldLeftLargeList           thrpt   25        193.640 ±       2.298  ops/s
ChainBench.foldLeftLargeVector         thrpt   25        178.370 ±       0.830  ops/s
ChainBench.foldLeftSmallChain          thrpt   25   43732934.777 ±  362285.965  ops/s
ChainBench.foldLeftSmallList           thrpt   25   51155941.055 ±  882005.961  ops/s
ChainBench.foldLeftSmallVector         thrpt   25   41902918.940 ±   53030.742  ops/s
ChainBench.lengthLargeChain            thrpt   25     131831.918 ±    1613.341  ops/s
ChainBench.lengthLargeList             thrpt   25        271.015 ±       0.962  ops/s
ChainBench.mapLargeChain               thrpt   25         78.162 ±       2.620  ops/s
ChainBench.mapLargeList                thrpt   25         73.676 ±       8.999  ops/s
ChainBench.mapLargeVector              thrpt   25        132.443 ±       2.360  ops/s
ChainBench.mapSmallChain               thrpt   25   24047623.583 ± 1834073.508  ops/s
ChainBench.mapSmallList                thrpt   25   21482014.328 ±  387854.819  ops/s
ChainBench.mapSmallVector              thrpt   25   34707281.383 ±  382477.558  ops/s
ChainBench.reverseLargeChain           thrpt   25      37700.549 ±     154.942  ops/s
ChainBench.reverseLargeList            thrpt   25        142.832 ±       3.626  ops/s

@johnynek
Copy link
Contributor

johnynek commented Jul 3, 2022

so my read is that Chain offers significant wins over List and Vector for concatenation heavy workloads (the accumulate benchmark) and that it is nearly as fast or faster than List for other cases.

I think this is inline with our use cases: Chain is for fast concatenations.

@TonioGela
Copy link
Member Author

so my read is that Chain offers significant wins over List and Vector for concatenation heavy workloads (the accumulate benchmark) and that it is nearly as fast or faster than List for other cases.

I think this is inline with our use cases: Chain is for fast concatenations.

I think so. I'll try to shave some time to produce the CollectiveMonoid benchmarks for 2.12.x and 3.x to see some progression. One gathered that results may offer some material to eventually refrase the claims in the Chain doc page.

@TonioGela
Copy link
Member Author

Benchmark on 3.1.3                Mode  Cnt          Score         Error  Units
ChainBench.consLargeChain        thrpt   25  155429106.376 ± 1695815.631  ops/s
ChainBench.consLargeList         thrpt   25  198151233.407 ± 3025478.884  ops/s
ChainBench.consLargeVector       thrpt   25    8944880.002 ±  114086.632  ops/s
ChainBench.consSmallChain        thrpt   25  140879917.607 ± 2475505.001  ops/s
ChainBench.consSmallList         thrpt   25  181978361.347 ± 3790526.428  ops/s
ChainBench.consSmallVector       thrpt   25   70513884.827 ±  616969.321  ops/s
ChainBench.createChainOption     thrpt   25  191794350.747 ± 3650538.355  ops/s
ChainBench.createChainSeqOption  thrpt   25   26429928.762 ±  178155.996  ops/s
ChainBench.createSmallChain      thrpt   25  109353998.594 ±  831171.464  ops/s
ChainBench.createSmallList       thrpt   25   24844040.176 ±  383172.135  ops/s
ChainBench.createSmallVector     thrpt   25   32791640.677 ±  227328.900  ops/s
ChainBench.createTinyChain       thrpt   25   85981419.271 ± 1196171.479  ops/s
ChainBench.createTinyList        thrpt   25   80807041.737 ± 2977727.401  ops/s
ChainBench.createTinyVector      thrpt   25   47363196.235 ± 1025465.154  ops/s
ChainBench.foldLeftLargeChain    thrpt   25        138.155 ±       4.069  ops/s
ChainBench.foldLeftLargeList     thrpt   25        193.560 ±      15.635  ops/s
ChainBench.foldLeftLargeVector   thrpt   25        190.837 ±       6.019  ops/s
ChainBench.foldLeftSmallChain    thrpt   25   46903312.215 ± 1871730.186  ops/s
ChainBench.foldLeftSmallList     thrpt   25   55035931.971 ± 1855894.368  ops/s
ChainBench.foldLeftSmallVector   thrpt   25   44809039.728 ± 1983158.031  ops/s
ChainBench.lengthLargeChain      thrpt   25     168919.562 ±    2094.673  ops/s
ChainBench.lengthLargeList       thrpt   25        268.747 ±       1.536  ops/s
ChainBench.mapLargeChain         thrpt   25         91.314 ±       4.059  ops/s
ChainBench.mapLargeList          thrpt   25         77.763 ±      10.822  ops/s
ChainBench.mapLargeVector        thrpt   25        145.901 ±       1.334  ops/s
ChainBench.mapSmallChain         thrpt   25   25573791.757 ±  499976.167  ops/s
ChainBench.mapSmallList          thrpt   25   23811925.333 ±  615177.743  ops/s
ChainBench.mapSmallVector        thrpt   25   34554607.407 ± 1251126.047  ops/s
ChainBench.reverseLargeChain     thrpt   25      42697.211 ±     552.743  ops/s
ChainBench.reverseLargeList      thrpt   25        163.085 ±      10.072  ops/s

@armanbilge armanbilge merged commit 15b4d33 into typelevel:main Jul 5, 2022
@TonioGela TonioGela deleted the bench_refactor branch July 5, 2022 20:06
@TonioGela
Copy link
Member Author

Last bit of "On my laptop™ benchmarks"

Benchmark on 2.12.16              Mode  Cnt          Score         Error  Units
ChainBench.consLargeChain        thrpt   25  152849367.773 ±  488485.527  ops/s
ChainBench.consLargeList         thrpt   25  218538256.786 ± 2090405.152  ops/s
ChainBench.consLargeVector       thrpt   25   11649376.721 ±   28401.013  ops/s
ChainBench.consSmallChain        thrpt   25  153729311.166 ±  215260.867  ops/s
ChainBench.consSmallList         thrpt   25  218225002.257 ± 3544400.759  ops/s
ChainBench.consSmallVector       thrpt   25   19349884.435 ±  161260.488  ops/s
ChainBench.createChainOption     thrpt   25  194648146.770 ± 3243862.858  ops/s
ChainBench.createChainSeqOption  thrpt   25   99237183.822 ±  959962.398  ops/s
ChainBench.createSmallChain      thrpt   25    6587813.412 ±   61191.050  ops/s
ChainBench.createSmallList       thrpt   25   19956990.399 ±  158023.941  ops/s
ChainBench.createSmallVector     thrpt   25   11276908.474 ±  305165.720  ops/s
ChainBench.createTinyChain       thrpt   25   19365000.329 ±  206773.643  ops/s
ChainBench.createTinyList        thrpt   25   91066517.051 ±  593975.876  ops/s
ChainBench.createTinyVector      thrpt   25   14686717.933 ±  288218.891  ops/s
ChainBench.foldLeftLargeChain    thrpt   25        126.876 ±       3.466  ops/s
ChainBench.foldLeftLargeList     thrpt   25        216.844 ±      12.993  ops/s
ChainBench.foldLeftLargeVector   thrpt   25        112.668 ±       0.524  ops/s
ChainBench.foldLeftSmallChain    thrpt   25   31543720.678 ±  194344.907  ops/s
ChainBench.foldLeftSmallList     thrpt   25   66607360.767 ±  123203.198  ops/s
ChainBench.foldLeftSmallVector   thrpt   25   14831499.201 ±  135245.656  ops/s
ChainBench.lengthLargeChain      thrpt   25     174937.752 ±    1612.051  ops/s
ChainBench.lengthLargeList       thrpt   25        287.508 ±       0.681  ops/s
ChainBench.mapLargeChain         thrpt   25         78.332 ±       1.475  ops/s
ChainBench.mapLargeList          thrpt   25         82.481 ±       4.213  ops/s
ChainBench.mapLargeVector        thrpt   25        103.491 ±       1.467  ops/s
ChainBench.mapSmallChain         thrpt   25   12234780.964 ±  217992.491  ops/s
ChainBench.mapSmallList          thrpt   25   22872891.991 ±  206212.273  ops/s
ChainBench.mapSmallVector        thrpt   25    9529464.560 ± 2075968.595  ops/s
ChainBench.reverseLargeChain     thrpt   25      46659.257 ±     275.119  ops/s
ChainBench.reverseLargeList      thrpt   25        156.478 ±      11.675  ops/s

@johnynek
Copy link
Contributor

@key-eugene you made a comment regarding the claim that vector isn't slower than list or chain in 2.13. I thought I would point out these benchmarks to you. Please update them if you think they do not reflect the real story.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
behind-the-scenes appreciated, but not user-facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants