Benchmarks #1673

janmasrovira · 2022-12-20T23:49:17Z

This pr automatizes the process of compiling, running and plotting the results of the existing benchmark suite.
It is enough to run cabal/stack bench. The generated files will be placed in .benchmark-results.
In order to generate the plots we use gnuplot, which needs to be installed in the system.

Some suites have not been included because the juvix version was crashing and there is no point in benchmarking the other languages. This will need to be changed when we complete the pipeline (#1531 and #1665).

If the binaries need to be recompiled, delete .benchmark-results/bin
If the benchmarks need to be run again, delete .benchmark-results/csv
If the benchmarks need to be replotted, delete .benchmark-results/plot

It took 40 minutes to run the benchmarks on my pc.
I have included a time limit of 30 seconds for each benchmark. This should probably be adjusted. Maybe we could try adjusting the confidence interval parameter instead.

After running cabal bench, the contents of .benchmark-results are:

 .benchmark-results
├──  bin
│  ├──  fibonacci
│  │  ├──  c
│  │  │  ├──  fibonacci.exe
│  │  │  └──  fibonacci.wasm
│  │  ├──  haskell
│  │  │  ├──  fibonacci.exe
│  │  │  └──  fibonacci.strict.exe
│  │  ├──  juvix
│  │  │  ├──  fibonacci.exe
│  │  │  └──  fibonacci.wasm
│  │  ├──  ocaml
│  │  │  ├──  fibonacci.byte.exe
│  │  │  └──  fibonacci.exe
│  │  └──  runtime
│  │     ├──  fibonacci.exe
│  │     └──  fibonacci.wasm
│  ├──  fold
│  │  ├──  haskell
│  │  │  ├──  fold.exe
│  │  │  └──  fold.strict.exe
│  │  ├──  juvix
│  │  │  ├──  fold.exe
│  │  │  └──  fold.wasm
│  │  ├──  ocaml
│  │  │  ├──  fold.byte.exe
│  │  │  └──  fold.exe
│  │  └──  runtime
│  │     ├──  fold.exe
│  │     └──  fold.wasm
│  ├──  mapfold
│  │  ├──  haskell
│  │  │  ├──  mapfold.exe
│  │  │  └──  mapfold.strict.exe
│  │  ├──  juvix
│  │  │  ├──  mapfold.exe
│  │  │  └──  mapfold.wasm
│  │  ├──  ocaml
│  │  │  ├──  mapfold.byte.exe
│  │  │  └──  mapfold.exe
│  │  └──  runtime
│  │     ├──  mapfold.exe
│  │     └──  mapfold.wasm
│  ├──  maybe
│  │  ├──  c
│  │  │  ├──  maybe.exe
│  │  │  └──  maybe.wasm
│  │  ├──  haskell
│  │  │  ├──  maybe.exe
│  │  │  └──  maybe.strict.exe
│  │  ├──  juvix
│  │  │  ├──  maybe.exe
│  │  │  └──  maybe.wasm
│  │  ├──  ocaml
│  │  │  ├──  maybe.byte.exe
│  │  │  └──  maybe.exe
│  │  └──  runtime
│  │     ├──  maybe.exe
│  │     └──  maybe.wasm
│  └──  mergesort
│     ├──  c
│     │  ├──  mergesort.exe
│     │  └──  mergesort.wasm
│     ├──  haskell
│     │  ├──  mergesort.exe
│     │  └──  mergesort.strict.exe
│     ├──  juvix
│     │  ├──  mergesort.exe
│     │  └──  mergesort.wasm
│     ├──  ocaml
│     │  ├──  mergesort.byte.exe
│     │  └──  mergesort.exe
│     └──  runtime
│        ├──  mergesort.exe
│        └──  mergesort.wasm
├──  csv
│  ├──  fibonacci.csv
│  ├──  fold.csv
│  ├──  mapfold.csv
│  ├──  maybe.csv
│  └──  mergesort.csv
└──  plot
   ├──  fibonacci.svg
   ├──  fold.svg
   ├──  mapfold.svg
   ├──  maybe.svg
   └──  mergesort.svg

Generated plots:

bench/Variants.hs

tests/benchmark/mergesort/c/mergesort.c

tests/benchmark/mergesort/core/mergesort.jvc

bench/Compile.hs

This reverts commit 84589ed.

This reverts commit d720615.

Closes #1644 #1635

Now it's possible to write `1 + 2` in the Juvix REPL and not get an error. Closes #1645. Co-authored-by: Jonathan Cubides <jonathan.cubides@uib.no>

lukaszcz · 2023-01-04T10:55:24Z

Regarding the increase in speed, I think fiddling with the confidence intervals is a good idea. But maybe it's possible to change the confidence intervals individually for different benchmarks? It's fine to have a much wider confidence interval if the measured value is 10 times bigger than all other values, but less so when values are close. And the benchmarks taking most time are probably the outliers, so it might be useful to have much larger intervals for them.

bench/Suites.hs

lukaszcz

I think all benchmarks should be included. If the current version of Juvix crashes, then this just should be noted on the plot / in the results. The crashes are for two reasons: no guarantee for tail-recursion (likely a stack overflow) or incorrect compilation of partial application (likely segfault / address boundary error).

lukaszcz · 2023-01-04T11:21:39Z

I think there is a point in benchmarking the runtime against other languages even if current Juvix crashes.

janmasrovira · 2023-01-05T14:43:02Z

I have included all Suites but excluded the variants that crash. I've removed the explicit time limit and put a confidence interval of 90 for every suite. We may want to try different values globally and for each suite, but I propose that we postone this for a later pr.

janmasrovira force-pushed the benchmarks branch from 0966476 to af25dc4 Compare December 20, 2022 23:51