Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Engine benchmark reporting #5714

Open
2 of 4 tasks
Akirathan opened this issue Feb 21, 2023 · 3 comments
Open
2 of 4 tasks

Improve Engine benchmark reporting #5714

Akirathan opened this issue Feb 21, 2023 · 3 comments
Assignees
Labels
-tooling Category: tooling x-on-hold

Comments

@Akirathan
Copy link
Member

Akirathan commented Feb 21, 2023

Currently, we report benchmarks in a very naive fashion - for each benchmark name (label), we report only score, which is a measurement of how many iterations were done in a millisecond. The benchmark reporting sources are located in org.enso.interpreter.bench package. All the benchmark jobs defined in Benchmark Actions upload a single bench-report.xml artifact that can be manually compared to other such artifacts from different jobs. This is very inconvenient.

Required fields per benchmark

Every benchmark report item should have at least these properties:

  • Name
  • Source
    • JMH vs Bench.measure
  • Version
    • So that we know that if we see a different number for a benchmark with same name but different version, we should not be surprised and look for changes in the benchmark
    • A single CHANGELOG should be provided where we should write all the version changes for all the benchmarks.
  • Time and Date of the benchmark run
  • Conclusion of the bench run - success/failure/...
  • Number of warmup iterations
  • Number of measurement iterations
  • Peak performance score (ops per ms)
  • Commit details
    • Commit ID, author, message
  • [Warmup performance score (ops per ms)]
    • Optional
    • So that we know there is not a huge regression in warmup.

We should also think about a better file format than XML. It is, for example, easier to manipulate json files.

Sufficient warmup

We should also think about how to ensure that there are no more ongoing Graal compilations when in measurement phases, i.e., we should ensure that the benchmark is stable. Ongoing compilations in measurement might signal that the warmup was insufficient, and it will also mess with the score. Note that during the warmup iterations, compilations are expected.

An idea how to automatically track sufficient warmup is in #6271 (comment)

Tasks

Preview Give feedback

single endpoint can be, e.g., a connection to a MongoDB. The Enso script for processing the result can connect to this MongoDB and download all the relevant benchmark reports.

Related issues

Edit 2023-08-22

The current state is that:

I am still not closing this issue, since there is some valuable information about what we should include in the benchmark result output. Currently, the benchmark results are still XML files with a single double score values.

@Akirathan Akirathan added p-medium Should be completed in the next few sprints -compiler labels Feb 21, 2023
@Akirathan
Copy link
Member Author

Assigned p-medium priority, because we basically waste roughly 3 hours of CPU time on every benchmark not automating the reporting now. I assume there is nobody manually checking the result of each benchmark.

@wdanilo
Copy link
Member

wdanilo commented Feb 21, 2023

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

@Akirathan
Copy link
Member Author

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Let's continue the discussion at #5718, I believe you can find some answers there.

@jdunkerley jdunkerley added -libs Libraries: New libraries to be implemented l-examples and removed -compiler labels Feb 24, 2023
@jdunkerley jdunkerley removed the status in Issues Board Feb 24, 2023
mergify bot pushed a commit that referenced this issue Mar 28, 2023
Add Engine benchmark analysis tool - a python script for downloading benchmark data, and Enso project for the analysis. I have also included benchmark data for 02/2022.

Related issues and discussions:
- #5714
- #5165
- #5718
@Akirathan Akirathan added -tooling Category: tooling s-research-needed Status: the task will require heavy research to complete and removed -libs Libraries: New libraries to be implemented l-examples labels Apr 15, 2023
@Akirathan Akirathan self-assigned this Aug 22, 2023
@Akirathan Akirathan added x-on-hold and removed p-medium Should be completed in the next few sprints s-research-needed Status: the task will require heavy research to complete labels Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-tooling Category: tooling x-on-hold
Projects
Status: New
Development

No branches or pull requests

4 participants