-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Add Performance Testing + SDK Span Performance Tests #1443
[PROPOSAL] Add Performance Testing + SDK Span Performance Tests #1443
Conversation
0afd9ba
to
6e5d7f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
6e5d7f1
to
fc0f2dc
Compare
@NathanielRN came across this https://pypi.org/project/scalene/, might be helpful for cpu and memory profiling. |
725b632
to
92099a8
Compare
Performance testing! This is outstanding 😎. I have one conceptual question (it very possibly may be related to the What does performance mean in:
Is performance measured in seconds? For example, a performance test is run once and it takes 10 seconds, so its performance is 10s? As far as I understand, a regression of less than 75% is accepted. So, if a test case takes 100s to run, it is accepted if it is run again and it takes 174s? Does that mean that it is acceptable that the performance can continue to degrade more and more as long as it is less than 75% of the previous run? |
@ocelotl That's a great question!
No, it compares against some baseline performance. In this PR I committed (e.g. This becomes the expected performance for the package. On subsequent PRs,
According to the pytest-benchmark docs tests are run for 1 second, as many times as that test can be run in that second:
And the units seem to be in seconds.
So a |
0aab259
to
488b4e7
Compare
@lonewolf3739 Thanks for the suggestion! Will be looking into this 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only be updating these saved benchmark files from Github action CI runs somehow, or maybe storing them as artifacts. The results could vary a lot depending on the PR author's hardware. If they run on Github Actions it should be somewhat normalized and wouldn't prevent people from using Mac/Windows for development, or force them to install all 5 versions of python we support to regen the benchmarks.
That might be a bit tricky, what is the Java SIG doing for their JMH benchmarks?
@@ -0,0 +1,104 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably add a linguist-generated
entry for these so to keep PR noise down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really cool! I didn't know about this 😄
However I think it's only big this time, the benefit of having it included in the diff is that when someone updates the performance later, only the numbers should change (which means we can see how the performance changes in the git diff)
Also if you add new performance tests, that will be clearly seen as only a few lines of code added to these files.
Let me know what you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it always updates the same file. I was looking at the pytest-benchmark docs here with --benchmark-autosave
. Is it worth preserving all previous runs? They show some cool box and whisker plots from previously saved runs. We can always iterate 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea! For now we will have a history of the mean throughput in benchmarks/data.js
for all commits, but if someone wants to generate the cool whisker plots all they would have to do is clone the repo and run tox -e tests-core-sdk -- --benchmark-histogram=<path>
to generate the .json
.
But I think it would be cool to have just the latest .json
file committed somewhere here on the repo later :)
tox.ini
Outdated
test: pytest {posargs} | ||
test: pytest --benchmark-save=now --benchmark-compare --benchmark-compare-fail=min:75% --benchmark-sort=mean --benchmark-columns=mean,min,max,stddev,median,ops,rounds {posargs} | ||
|
||
core-sdk: {toxinidir}/tests/print_benchmarks.py {toxinidir}/opentelemetry-sdk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow what this is for? Doesn't already print when you run pytest
in a nice colored format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It prints the table which is a nice visual format. However, we want to see the .json
file produced because pytest-benchmark
can only run comparisons between .json
files, not tables.
So the .json
file is printed to the workflow logs so that the author can see the performance results of their changes according to the normalize standard of "running on GitHub actions".
And if they improved performance they would update the relevant files in 0001_old.json
for each python version 😄 (This is the only manual process of this 😕 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close, they would copy/paste the output JSON into the opentelemetry-sdk/tests/performance/.benchmarks/Linux-CPython-3.5-64bit/0001_old.json
file for example.
This is only if they want to change the expected performance standard, which is something we could automate to happen every "release" for example.
The intention is to only update form the Github action CI 🙂 Short of committing the actual results into the PRs files (I don't know how to do that and I don't think we should...) the the next best solution I thought of was to actually output the results in the GitHub Action CI logs. That's where the
Author's will never have to commit results from their machine, they will use the GitHub actions machine results. We'll know it came from GitHub because Compare this to a run on my local machine which has
Agreed!
Yes also agreed :) I tested locally and you can still run
The Java SIG seems to be adding the performance test results in the PR description: See PR 2186. And I think I heard they post them in some When considering their solution, I thought this solution here where we commit them as |
30bbf14
to
93885b3
Compare
opentelemetry-sdk/tests/performance/benchmarks/trace/test_benchmark_trace.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NathanielRN why aren't we seeing the run on this PR? Is it because there is no existing saved benchmarks, but it will work going forward?
Performance progression of benchmarks for packages distributed by OpenTelemetry Python can be viewed as a [graph of throughput vs commit history](https://open-telemetry.github.io/opentelemetry-python/benchmarks/index.html). From this page, you can download a JSON file with the performance results. | ||
|
||
Running the `tox` tests also runs the performance tests if any are available. Benchmarking tests are done with `pytest-benchmark` and they output a table with results to the console. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NathanielRN this will show the perf change on your own machine between runs, right?
Do the JSON files need to be added to .gitignore for this PR?
Yes exactly that is what I expect, on my test repo I did not see the comments until I had something on |
de3277a
to
86b642e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niiiiiiiiiiiiiiiiiiiiice.
As discussed with @NathanielRN , since readthedocs builds from master, this github action will also write to master in the benchmarks folder for every build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question regarding specification compliance 👍
opentelemetry-sdk/tests/performance/benchmarks/trace/test_benchmark_trace.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
577533b
to
22d48fa
Compare
Co-authored-by: Aaron Abbott <aaronabbott@google.com>
22d48fa
to
ea7643c
Compare
Description
Part of #335, this PR aims to include the
pytest-benchmark
library to get micro benching results on some key SDK operations.Specifically, this follows the OpenTelemetry Specifications on Performance Benchmarking guidelines.
Regarding
pytest-benchmark
:Pros:
pytest
gh-pages
, people can create these or we can create these!)Cons:
Type of change
In this PR:
tox.ini
to print the tests using the addedtests/print_benchmarks.py
scriptgh-pages
branch and at the link https://open-telemetry.github.io/opentelemetry-python/benchmarks/index.htmlHow Has This Been Tested?
Locally
Running
tox -e test-core-sdk
produces the following table:Example Repo
I made an example repo here to prove the json merging works
You can see the difference between PRs that have different performance benchmarks:
This combines all the tests under
OpenTelemetry Python Benchmark
as seen here. The fluctuations are on purpose because I made the test take longer.Does This PR Require a Contrib Repo Change?
Checklist:
- [] Changelogs have been updated(Not changing functionality, only adding tests- [ ] Unit tests have been added- [ ] Documentation has been updated