Speed benchmarks up by shortening the 10s iteration time #6271

JaroslavTulach · 2023-04-13T13:48:52Z

While using the new IfVsCaseBenchmark I realized we can specify time for each @Warmup and @Measurement. The regular time is 10s, but it seems 1s for each warm up and 3s for a measurement is good enough for the IfVsCaseBenchmark. Given it has five warmup iterations and three measurements we are down from 80s to 11s. Of course, shortening the warmup time too much may decrease stability of our benchmarks, but the overall speed up seems too attractive to not try that.

Implement Truffle compilation detection
Investigate if time can be lowered from 10s for various benchmarks
Run the results few times on the server
Verify benchmarks run faster
Verify benchmark results are affected by Truffle compilation

CCing @jdunkerley , @Akirathan

The text was updated successfully, but these errors were encountered:

Akirathan · 2023-04-13T15:22:28Z

As long as there is no, or almost no, background compilation in the measurement phase, then yes, we can decrease the warmup time. I would be careful about it, though. For example, with the current JMH framework we use, there is no simple way how to tell that there are still some compilations in the measurement phase. Note that if we want to measure only the performance of Enso, and not performance of Graal compilations, all the Graal compilations should be done during the warmup phase.

In FastR, I reimplemented the whole benchmarking infrastructure from the scratch. I started every benchmark with --engine.TraceCompilationDetails, captured the whole output with timestamps, and once the benchmark ended, I found out whether there were any engine compilation logs during the measurement phase.

This is just an example of almost reliable way how to detect that the warmup was sufficient. Before we move on with the investigation, I strongly recommend implementing a similar system to recognize a sufficient warmup and modify the times accordingly.

I am not even mentioning that it would be a very good idea to introduce benchmark versioning if we are about to fiddle with any parameters. More details in #5714

JaroslavTulach · 2023-04-14T04:59:37Z

almost reliable way how to detect that the warmup was sufficient.

I think that tracing the Truffle compilation and failing if it happens during measurement is a good idea. Here is an implementation for Enso benchmarks:

enso$ git diff engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
diff --git engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
index ef8b748ab8..481b607551 100644
--- engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
+++ engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
@@ -1,9 +1,12 @@
 package org.enso.interpreter.bench.benchmarks.semantic;
 
-import java.io.ByteArrayOutputStream;
 import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
 import java.util.concurrent.TimeUnit;
 import java.util.function.Function;
+import java.util.logging.Handler;
+import java.util.logging.LogRecord;
 import org.graalvm.polyglot.Context;
 import org.graalvm.polyglot.Value;
 import org.openjdk.jmh.annotations.Benchmark;
@@ -18,12 +21,14 @@ import org.openjdk.jmh.annotations.State;
 import org.openjdk.jmh.annotations.Warmup;
 import org.openjdk.jmh.infra.BenchmarkParams;
 import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.infra.IterationParams;
+import org.openjdk.jmh.runner.IterationType;
 
 
 @BenchmarkMode(Mode.AverageTime)
 @Fork(1)
-@Warmup(iterations = 3)
-@Measurement(iterations = 5)
+@Warmup(iterations = 1, time = 1)
+@Measurement(iterations = 5, time = 1)
 @OutputTimeUnit(TimeUnit.MILLISECONDS)
 @State(Scope.Benchmark)
 public class ListBenchmarks {
@@ -33,6 +38,7 @@ public class ListBenchmarks {
   private Value self;
   private Value sum;
   private Value oldSum;
+  private List<LogRecord> messages = new ArrayList<>();
 
   @Setup
   public void initializeBenchmark(BenchmarkParams params) throws Exception {
@@ -40,7 +46,21 @@ public class ListBenchmarks {
       .allowExperimentalOptions(true)
       .allowIO(true)
       .allowAllAccess(true)
-      .logHandler(new ByteArrayOutputStream())
+      .option("engine.TraceCompilation", "true")
+      .logHandler(new Handler() {
+        @Override
+        public void publish(LogRecord lr) {
+          messages.add(lr);
+        }
+
+        @Override
+        public void flush() {
+        }
+
+        @Override
+        public void close() throws SecurityException {
+        }
+      })
       .option(
         "enso.languageHomeOverride",
         Paths.get("../../distribution/component").toFile().getAbsolutePath()
@@ -113,16 +133,18 @@ public class ListBenchmarks {
   }
 
   @Benchmark
-  public void mapOverList(Blackhole matter) {
-    performBenchmark(matter);
+  public void mapOverList(IterationParams it, Blackhole matter) {
+    performBenchmark(it, matter);
   }
 
   @Benchmark
-  public void mapOverLazyList(Blackhole matter) {
-    performBenchmark(matter);
+  public void mapOverLazyList(IterationParams it, Blackhole matter) {
+    performBenchmark(it, matter);
   }
 
-  private void performBenchmark(Blackhole hole) throws AssertionError {
+  private void performBenchmark(IterationParams it, Blackhole hole) throws AssertionError {
+    messages.clear();
+
     var newList = plusOne.execute(self, list);
     var newSum = sum.execute(self, newList, 0);
 
@@ -131,6 +153,14 @@ public class ListBenchmarks {
       throw new AssertionError("Unexpected result " + result);
     }
     hole.consume(result);
+    if (it.getType() == IterationType.MEASUREMENT) {
+      if (messages.size() > 0) {
+        for (var lr : messages) {
+          System.err.println(lr.getMessage());
+          throw new AssertionError("Found compilation " + messages.size() + " messages during measurement");
+        }
+      }
+    }
   }
 }

With these changes (especially after changing the number of warmups to 1 and giving it just 1s) my:

sbt:runtime> benchOnly ListBenchmarks
Iteration   1: 
opt done   id=3593  org.graalvm.polyglot.Value<Function>.execute       |Tier 2|Time   593( 239+354 )ms|AST   47|Inlined  53Y   0N|IR   3573/ 17618|CodeSize   74181|Addr 0x7f0707edca90|Timestamp 91619721533011|Src n/a
<failure>

java.lang.AssertionError: Found compilation 1 messages during measurement
        at org.enso.interpreter.bench.benchmarks.semantic.ListBenchmarks.performBenchmark(ListBenchmarks.java:160)
        at org.enso.interpreter.bench.benchmarks.semantic.ListBenchmarks.mapOverLazyList(ListBenchmarks.java:142)
        at org.enso.interpreter.bench.benchmarks.semantic.jmh_generated.ListBenchmarks_mapOverLazyList_jmhTest:232

This way we can detect skewed results because of compilation not being finished yet before measurements. Of course this sketch would rather be turned into a support TruffleCompilationHandler class and used by all our benchmarks.

JaroslavTulach · 2024-07-17T09:31:00Z

Regardless we lower the time, we need to detect whether a compilation is on while benchmarking. Being implemented as

Detect compilation while benchmarking #10574

JaroslavTulach added -ci -compiler --low-performance labels Apr 13, 2023

JaroslavTulach added this to the Beta Release milestone Apr 13, 2023

github-project-automation bot added this to Issues Board Apr 13, 2023

github-project-automation bot moved this to ❓New in Issues Board Apr 13, 2023

enso-bot bot mentioned this issue Apr 14, 2023

If cascades 5x slower than case-of due to limit of recursive inlining #5709

Closed

Akirathan mentioned this issue Apr 15, 2023

Improve Engine benchmark reporting #5714

Open

jdunkerley removed this from the Beta Release milestone Apr 18, 2023

jdunkerley removed this from Issues Board Apr 18, 2023

Akirathan mentioned this issue May 9, 2023

Limit the number of reported warnings #6577

Merged

2 tasks

JaroslavTulach mentioned this issue Jul 18, 2023

Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso #7270

Merged

5 tasks

Akirathan mentioned this issue Jul 31, 2024

Detect compilation during measurement phase in runtime-benchmarks #10723

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed benchmarks up by shortening the 10s iteration time #6271

Speed benchmarks up by shortening the 10s iteration time #6271

JaroslavTulach commented Apr 13, 2023 •

edited

Loading

Akirathan commented Apr 13, 2023

JaroslavTulach commented Apr 14, 2023 •

edited

Loading

JaroslavTulach commented Jul 17, 2024

Speed benchmarks up by shortening the 10s iteration time #6271

Speed benchmarks up by shortening the 10s iteration time #6271

Comments

JaroslavTulach commented Apr 13, 2023 • edited Loading

Akirathan commented Apr 13, 2023

JaroslavTulach commented Apr 14, 2023 • edited Loading

JaroslavTulach commented Jul 17, 2024

JaroslavTulach commented Apr 13, 2023 •

edited

Loading

JaroslavTulach commented Apr 14, 2023 •

edited

Loading