Improve Engine benchmark reporting #5714

Akirathan · 2023-02-21T10:49:32Z

Currently, we report benchmarks in a very naive fashion - for each benchmark name (label), we report only score, which is a measurement of how many iterations were done in a millisecond. The benchmark reporting sources are located in org.enso.interpreter.bench package. All the benchmark jobs defined in Benchmark Actions upload a single bench-report.xml artifact that can be manually compared to other such artifacts from different jobs. This is very inconvenient.

Required fields per benchmark

Every benchmark report item should have at least these properties:

Name
Source
- JMH vs Bench.measure
Version
- So that we know that if we see a different number for a benchmark with same name but different version, we should not be surprised and look for changes in the benchmark
- A single CHANGELOG should be provided where we should write all the version changes for all the benchmarks.
Time and Date of the benchmark run
Conclusion of the bench run - success/failure/...
- Avoid problems with erroneous benchmarks not reported anywhere - Compiler benchmarks are failing and not reporting any results #9394
Number of warmup iterations
Number of measurement iterations
Peak performance score (ops per ms)
Commit details
- Commit ID, author, message
[Warmup performance score (ops per ms)]
- Optional
- So that we know there is not a huge regression in warmup.

We should also think about a better file format than XML. It is, for example, easier to manipulate json files.

Sufficient warmup

We should also think about how to ensure that there are no more ongoing Graal compilations when in measurement phases, i.e., we should ensure that the benchmark is stable. Ongoing compilations in measurement might signal that the warmup was insufficient, and it will also mess with the score. Note that during the warmup iterations, compilations are expected.

An idea how to automatically track sufficient warmup is in #6271 (comment)

Tasks

Give feedback

Improve the bench reports to contain at least the fields mentioned above.
Make sure that the warmup is sufficient for all the benchmarks.
Explore some third-party technologies that can automatically visualize benchmark results
Enso's Bench.measure method can optionally have the same output as JMH. We already have substantial amount of these benchmarks in tests/Benchmark/ module.
Options

single endpoint can be, e.g., a connection to a MongoDB. The Enso script for processing the result can connect to this MongoDB and download all the relevant benchmark reports.

Related issues

Blocker for Comparing Enso benchmarks results with Enso #5165

Edit 2023-08-22

The current state is that:

We collect the Engine benchmarks at https://enso-org.github.io/engine-benchmark-results/
No need to provide a single endpoint for benchmark results. Let's keep it as simple as possible.
Working on Add std-libs benchmarks GH workflow #7597

I am still not closing this issue, since there is some valuable information about what we should include in the benchmark result output. Currently, the benchmark results are still XML files with a single double score values.

The text was updated successfully, but these errors were encountered:

Akirathan · 2023-02-21T10:51:52Z

Assigned p-medium priority, because we basically waste roughly 3 hours of CPU time on every benchmark not automating the reporting now. I assume there is nobody manually checking the result of each benchmark.

wdanilo · 2023-02-21T14:25:02Z

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Akirathan · 2023-02-21T15:10:06Z

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Let's continue the discussion at #5718, I believe you can find some answers there.

Add Engine benchmark analysis tool - a python script for downloading benchmark data, and Enso project for the analysis. I have also included benchmark data for 02/2022. Related issues and discussions: - #5714 - #5165 - #5718

Akirathan added p-medium Should be completed in the next few sprints -compiler labels Feb 21, 2023

github-project-automation bot added this to Issues Board Feb 21, 2023

github-project-automation bot moved this to ❓New in Issues Board Feb 21, 2023

Akirathan mentioned this issue Feb 21, 2023

Comparing Enso benchmarks results with Enso #5165

Closed

6 tasks

sylwiabr modified the milestones: Design Partners, Beta Release Feb 21, 2023

jdunkerley added -libs Libraries: New libraries to be implemented l-examples and removed -compiler labels Feb 24, 2023

jdunkerley removed the status in Issues Board Feb 24, 2023

jdunkerley removed this from Issues Board Feb 24, 2023

Akirathan mentioned this issue Mar 8, 2023

Add engine benchmark analysis tool #5852

Merged

4 tasks

Akirathan mentioned this issue Apr 13, 2023

Speed benchmarks up by shortening the 10s iteration time #6271

Open

5 tasks

github-project-automation bot added this to Issues Board Apr 15, 2023

github-project-automation bot moved this to ❓New in Issues Board Apr 15, 2023

Akirathan added -tooling Category: tooling s-research-needed Status: the task will require heavy research to complete and removed -libs Libraries: New libraries to be implemented l-examples labels Apr 15, 2023

Akirathan self-assigned this Aug 22, 2023

Akirathan added x-on-hold and removed p-medium Should be completed in the next few sprints s-research-needed Status: the task will require heavy research to complete labels Aug 22, 2023

Akirathan mentioned this issue Sep 11, 2023

Benchmark shall loop correctly #7781

Merged

2 tasks

Akirathan mentioned this issue Sep 18, 2023

Rename Decimal to Float #7807

Merged

2 tasks

This was referenced Jun 3, 2024

Sort rational numbers and primes first benchmarks #10142

Merged

Vue.js SPA displaying benchmark results - e.g. website 2.0 #10212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Engine benchmark reporting #5714

Improve Engine benchmark reporting #5714

Akirathan commented Feb 21, 2023 •

edited

Loading

Tasks

Akirathan commented Feb 21, 2023

wdanilo commented Feb 21, 2023

Akirathan commented Feb 21, 2023

Improve Engine benchmark reporting #5714

Improve Engine benchmark reporting #5714

Comments

Akirathan commented Feb 21, 2023 • edited Loading

Required fields per benchmark

Sufficient warmup

Tasks

Related issues

Edit 2023-08-22

Akirathan commented Feb 21, 2023

wdanilo commented Feb 21, 2023

Akirathan commented Feb 21, 2023

Akirathan commented Feb 21, 2023 •

edited

Loading