Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed benchmarks up by shortening the 10s iteration time #6271

Open
5 tasks
JaroslavTulach opened this issue Apr 13, 2023 · 3 comments
Open
5 tasks

Speed benchmarks up by shortening the 10s iteration time #6271

JaroslavTulach opened this issue Apr 13, 2023 · 3 comments

Comments

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Apr 13, 2023

While using the new IfVsCaseBenchmark I realized we can specify time for each @Warmup and @Measurement. The regular time is 10s, but it seems 1s for each warm up and 3s for a measurement is good enough for the IfVsCaseBenchmark. Given it has five warmup iterations and three measurements we are down from 80s to 11s. Of course, shortening the warmup time too much may decrease stability of our benchmarks, but the overall speed up seems too attractive to not try that.

  • Implement Truffle compilation detection
  • Investigate if time can be lowered from 10s for various benchmarks
  • Run the results few times on the server
  • Verify benchmarks run faster
  • Verify benchmark results are affected by Truffle compilation

CCing @jdunkerley , @Akirathan

@Akirathan
Copy link
Member

As long as there is no, or almost no, background compilation in the measurement phase, then yes, we can decrease the warmup time. I would be careful about it, though. For example, with the current JMH framework we use, there is no simple way how to tell that there are still some compilations in the measurement phase. Note that if we want to measure only the performance of Enso, and not performance of Graal compilations, all the Graal compilations should be done during the warmup phase.

In FastR, I reimplemented the whole benchmarking infrastructure from the scratch. I started every benchmark with --engine.TraceCompilationDetails, captured the whole output with timestamps, and once the benchmark ended, I found out whether there were any engine compilation logs during the measurement phase.

This is just an example of almost reliable way how to detect that the warmup was sufficient. Before we move on with the investigation, I strongly recommend implementing a similar system to recognize a sufficient warmup and modify the times accordingly.

I am not even mentioning that it would be a very good idea to introduce benchmark versioning if we are about to fiddle with any parameters. More details in #5714

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Apr 14, 2023

almost reliable way how to detect that the warmup was sufficient.

I think that tracing the Truffle compilation and failing if it happens during measurement is a good idea. Here is an implementation for Enso benchmarks:

enso$ git diff engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
diff --git engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
index ef8b748ab8..481b607551 100644
--- engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
+++ engine/runtime/src/bench/java/org/enso/interpreter/bench/benchmarks/semantic/ListBenchmarks.java
@@ -1,9 +1,12 @@
 package org.enso.interpreter.bench.benchmarks.semantic;
 
-import java.io.ByteArrayOutputStream;
 import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
 import java.util.concurrent.TimeUnit;
 import java.util.function.Function;
+import java.util.logging.Handler;
+import java.util.logging.LogRecord;
 import org.graalvm.polyglot.Context;
 import org.graalvm.polyglot.Value;
 import org.openjdk.jmh.annotations.Benchmark;
@@ -18,12 +21,14 @@ import org.openjdk.jmh.annotations.State;
 import org.openjdk.jmh.annotations.Warmup;
 import org.openjdk.jmh.infra.BenchmarkParams;
 import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.infra.IterationParams;
+import org.openjdk.jmh.runner.IterationType;
 
 
 @BenchmarkMode(Mode.AverageTime)
 @Fork(1)
-@Warmup(iterations = 3)
-@Measurement(iterations = 5)
+@Warmup(iterations = 1, time = 1)
+@Measurement(iterations = 5, time = 1)
 @OutputTimeUnit(TimeUnit.MILLISECONDS)
 @State(Scope.Benchmark)
 public class ListBenchmarks {
@@ -33,6 +38,7 @@ public class ListBenchmarks {
   private Value self;
   private Value sum;
   private Value oldSum;
+  private List<LogRecord> messages = new ArrayList<>();
 
   @Setup
   public void initializeBenchmark(BenchmarkParams params) throws Exception {
@@ -40,7 +46,21 @@ public class ListBenchmarks {
       .allowExperimentalOptions(true)
       .allowIO(true)
       .allowAllAccess(true)
-      .logHandler(new ByteArrayOutputStream())
+      .option("engine.TraceCompilation", "true")
+      .logHandler(new Handler() {
+        @Override
+        public void publish(LogRecord lr) {
+          messages.add(lr);
+        }
+
+        @Override
+        public void flush() {
+        }
+
+        @Override
+        public void close() throws SecurityException {
+        }
+      })
       .option(
         "enso.languageHomeOverride",
         Paths.get("../../distribution/component").toFile().getAbsolutePath()
@@ -113,16 +133,18 @@ public class ListBenchmarks {
   }
 
   @Benchmark
-  public void mapOverList(Blackhole matter) {
-    performBenchmark(matter);
+  public void mapOverList(IterationParams it, Blackhole matter) {
+    performBenchmark(it, matter);
   }
 
   @Benchmark
-  public void mapOverLazyList(Blackhole matter) {
-    performBenchmark(matter);
+  public void mapOverLazyList(IterationParams it, Blackhole matter) {
+    performBenchmark(it, matter);
   }
 
-  private void performBenchmark(Blackhole hole) throws AssertionError {
+  private void performBenchmark(IterationParams it, Blackhole hole) throws AssertionError {
+    messages.clear();
+
     var newList = plusOne.execute(self, list);
     var newSum = sum.execute(self, newList, 0);
 
@@ -131,6 +153,14 @@ public class ListBenchmarks {
       throw new AssertionError("Unexpected result " + result);
     }
     hole.consume(result);
+    if (it.getType() == IterationType.MEASUREMENT) {
+      if (messages.size() > 0) {
+        for (var lr : messages) {
+          System.err.println(lr.getMessage());
+          throw new AssertionError("Found compilation " + messages.size() + " messages during measurement");
+        }
+      }
+    }
   }
 }

With these changes (especially after changing the number of warmups to 1 and giving it just 1s) my:

sbt:runtime> benchOnly ListBenchmarks
Iteration   1: 
opt done   id=3593  org.graalvm.polyglot.Value<Function>.execute       |Tier 2|Time   593( 239+354 )ms|AST   47|Inlined  53Y   0N|IR   3573/ 17618|CodeSize   74181|Addr 0x7f0707edca90|Timestamp 91619721533011|Src n/a
<failure>

java.lang.AssertionError: Found compilation 1 messages during measurement
        at org.enso.interpreter.bench.benchmarks.semantic.ListBenchmarks.performBenchmark(ListBenchmarks.java:160)
        at org.enso.interpreter.bench.benchmarks.semantic.ListBenchmarks.mapOverLazyList(ListBenchmarks.java:142)
        at org.enso.interpreter.bench.benchmarks.semantic.jmh_generated.ListBenchmarks_mapOverLazyList_jmhTest:232

This way we can detect skewed results because of compilation not being finished yet before measurements. Of course this sketch would rather be turned into a support TruffleCompilationHandler class and used by all our benchmarks.

@JaroslavTulach
Copy link
Member Author

Regardless we lower the time, we need to detect whether a compilation is on while benchmarking. Being implemented as

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants