Benchmarking #136

gvanrossum · 2021-03-05T18:39:26Z

gvanrossum
Mar 5, 2021
Maintainer

I'd like to have a benchmark so we have something concrete to target.

There are many benchmarks in PyPerformance and it runs for a long time. Some of the benchmarks are ancient (from the FORTRAN days) and focus on numeric array operations. I'm not interested in those (the users who have numeric arrays are all using numpy or a tensor package).

I like benchmarks that represent a more OO style of coding. (Note that even the "float" benchmark, which is supposed to measure some float operations including sqrt() and sin()/cos() was sped up by an improvement to the LOAD_ATTR opcode to speed up slots. :-) In PyPerformance there is a group that represent "apps" that we could use, or we could pick one of these.

There are also some benchmarks that the Pyston v2 project created: https://github.com/pyston/python-macrobenchmarks/ -- these would be interesting to try since they've got a somewhat similar goal as we do (keep the C/API unchanged) and they're farther along (claiming to be 20% faster) but they're closed source (for now).

For me, an important requirement is that a benchmark runs fairly quickly. If I have a benchmark that runs for a minute I'd probably be running it a lot to validate various tweaks I am experimenting with, even if I knew that the results were pretty noisy. OTOH if I only had a benchmark that ran for 15 minutes I'd probably run it only once or twice a day. If it ran for an hour I'd probably only run it overnight. We should probably run all of PyPerformance occasionally since it is used by the core dev team to validate whether a proposed speedup (a) does anything good for at least some of the benchmarks, and (b) doesn't slow anything down.

ericsnowcurrently · 2021-03-05T22:24:53Z

ericsnowcurrently
Mar 5, 2021
Maintainer

(I posted the following to #11 before I saw this. 🙂)

Benchmarking is critical for weeding out less fruitful ideas early and other decision-making, as well as monitoring progress and communicating the merits of our work. So we will be running benchmarks frequently and want the workflow to be as low-overhead as possible.

Relevant topics

subsets of a benchmark suite
- focus on only a subset of the benchmark suite
- frequent runs on a subset (for speed), periodic runs on full suite
adding to pyperformance
- should we add more benchmarks to the suite (e.g. datascience-oriented)
- borrow from pyston 2 benchmarks?
PGO/LTO
- per Greg Smith, don't worry about PGO/LTO until ready for upstream PR
tooling
- use pyperformance
- (maybe) a reporting site (a la speed.python.org)
- (maybe) a job queue for benchmark runs on a central benchmarking server, with a CLI client to make requests, get results, etc.
hardware
- we're working on getting some dedicated hardware

Profiling is a different question.

0 replies

gvanrossum · 2021-03-05T23:24:07Z

gvanrossum
Mar 5, 2021
Maintainer Author

Good thoughts!

Last we talked you had an issue running pyperformance in the "disassembler" branch -- did you solve this yet?
Once we have a specific benchmark that we want to improve we could profile (or otherwise instrument) CPython as it's running that benchmark, and use the profile data to direct our efforts.

0 replies

markshannon · 2021-03-08T12:02:03Z

markshannon
Mar 8, 2021
Collaborator

Another benchmark suite to consider is the pyston suite: https://github.com/pyston/python-macrobenchmarks/

0 replies

ericsnowcurrently · 2021-03-11T17:18:21Z

ericsnowcurrently
Mar 11, 2021
Maintainer

The impact of many of the optimizations we are pursuing (especially in the eval loop) is tied to various specific workloads, sometimes significantly. So it is important that we choose our target workloads conscientiously, and even document the rationale for the choices. In some cases it will also require that we add to our benchmark suite.

That said, I do not think we need to focus much at first on the best target workloads, other than to let the idea simmer. We'll be fine for the moment with just the available suites, microbenchmarks and all. I'm sure that it won't take long before we build a stronger intuition for targeting specific workloads with our optimizations, at which point we can apply increasing discipline to our selection (of both benchmarks and optimization ideas). An iterative process like that will allow us to ramp up our effectiveness on this project.

0 replies

ericsnowcurrently · 2021-03-16T16:56:28Z

ericsnowcurrently
Mar 16, 2021
Maintainer

FWIW, @zooba pointed me at https://github.com/Azure/azure-sdk-for-python/blob/master/doc/dev/perfstress_tests.md. This is a tool and framework for stress-testing the azure SDK. It isn't something we would use but does offer some insight into a different sort of benchmarking. There may be a lesson or two in there for us, if we don't have other things to look into. 🙂

0 replies

gvanrossum · 2021-03-16T17:02:06Z

gvanrossum
Mar 16, 2021
Maintainer Author

Ooh, cool. Maybe we could contact the author and ask them what they have learned.

0 replies

markshannon · 2021-04-05T16:58:42Z

markshannon
Apr 5, 2021
Collaborator

Emery Berger has done some work on randomized benchmarking to remove a lot of systematic errors.
https://emeryberger.com/
https://emeryberger.com/research/stabilizer/

0 replies

ericsnowcurrently · 2021-08-23T16:12:51Z

ericsnowcurrently
Aug 23, 2021
Maintainer

An interesting article on getting reliable benchmark results from a CI system (e.g. GitHub Actions): https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking #136

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Benchmarking #136

gvanrossum Mar 5, 2021 Maintainer

Replies: 8 comments

ericsnowcurrently Mar 5, 2021 Maintainer

Relevant topics

gvanrossum Mar 5, 2021 Maintainer Author

markshannon Mar 8, 2021 Collaborator

ericsnowcurrently Mar 11, 2021 Maintainer

ericsnowcurrently Mar 16, 2021 Maintainer

gvanrossum Mar 16, 2021 Maintainer Author

markshannon Apr 5, 2021 Collaborator

ericsnowcurrently Aug 23, 2021 Maintainer

gvanrossum
Mar 5, 2021
Maintainer

ericsnowcurrently
Mar 5, 2021
Maintainer

gvanrossum
Mar 5, 2021
Maintainer Author

markshannon
Mar 8, 2021
Collaborator

ericsnowcurrently
Mar 11, 2021
Maintainer

ericsnowcurrently
Mar 16, 2021
Maintainer

gvanrossum
Mar 16, 2021
Maintainer Author

markshannon
Apr 5, 2021
Collaborator

ericsnowcurrently
Aug 23, 2021
Maintainer