Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks grant #45049

Closed
datapythonista opened this issue Dec 24, 2021 · 6 comments · Fixed by #46483
Closed

Benchmarks grant #45049

datapythonista opened this issue Dec 24, 2021 · 6 comments · Fixed by #46483
Labels
Admin Administrative tasks related to the pandas project Benchmark Performance (ASV) benchmarks

Comments

@datapythonista
Copy link
Member

datapythonista commented Dec 24, 2021

This issue is to provide information, and allow discussion, on the NumFOCUS small development grant to improve benchmarks for pandas (and possibly other projects of the ecosystem). The content of this issue is work in progress, and will be updated as needed.

List of some of the tasks identified to improve asv itself:

  • Evaluate the need of forking asv, and fork if needed. The project looks abandoned, and maintainers have been contacted, but no answer yet.
  • Update the style of asv (codebase currently is not following PEP-8), and remove Python 2 compatibility code (and six dependency)
  • Evaluate if it make sense to allow asv to be imported in benchmarks, and possibly implement it
  • Evaluate and possibly implement API changes to asv, to improve readability, usability and code clarity. For example, would it be a good idea to provide an abstract base class for benchmarks?
  • Evaluate using conbench or codespeed as the UI for benchmarks, and possibly make asv compatible with one or more external UI's, and drop asv UI
  • Find code not in use anymore, and remove it

Work on the pandas benchmarks:

  • Make benchmarks run faster, analyzing slower benchmarks, and by using less data when make sense, or avoiding unnecessary parametrizations PERF: Some asv tests take a long time to setup #16803 CI: benchmark build is taking a long time #44450 (comment)
  • Improve the CI for pandas benchmarks. Making them more reliable (we're currently using a grep to see if the build has failed). We can also check if the logs can be improved
  • Review benchmark structure, see if it can be improved so benchmarks are easier to find, and remove duplicate benchmarks if it makes sense
  • Review existing bechmarks issues, discuss with the pandas team on any that should be prioritizes, and work on the important ones

Work on infrastructure:

  • Evaluate options to replace the current pandas benchmarks infrastructure and coordinate with NumFOCUS
  • Add hooks and notifications that make sense to detect performance regressions as soon as possible

List of some other projects using asv:

Feedback very welcome

CC: @dorothykiz1 @LucyJimenez
CC: @pandas-dev/pandas-core @rgommers

@datapythonista datapythonista added the Benchmark Performance (ASV) benchmarks label Dec 24, 2021
@rgommers
Copy link
Contributor

Thanks for leading this effort @datapythonista!

Cc'ing a few more folks from NumPy/SciPy/scikit-image who I know will be interested in this topic: @mattip, @stefanv, @mdhaber.

@mattip
Copy link
Contributor

mattip commented Dec 24, 2021

Maybe I do not remember correctly, but it seemed to me asv is a wrapper around the pyperf package for benchmarking. I do like the idea of separating the benchmark runner from the GUI data presentation. Another presentation framework is codespeed, which is used by speed.python.org and speed.pypy.org. I could not find a way to get an overview on conbench (like asv and codespeed provide). I like the way the overview easily lets one see what has changed over time across all the benchmarks.

There is a lot to discuss around benchmarking itself: microbenchmarking vs. application benchmarking, stability, what to measure, how much can be done on CI hosted machines vs. private runners.

@mdhaber
Copy link

mdhaber commented Dec 25, 2021

This sounds great. If you decide to fork asv, I'd be interested in testing the fork.

In case you are interested in suggestions that would help maintainers of other projects, I'll include thoughts from my experience with ASV below.

Please let me know if either of these should be filed as issues at https://github.com/airspeed-velocity/asv. - I haven't checked recently, but in the past I've been unable to use conda-installed ASV on my Windows machine (see specific error [here](https://github.com/scipy/scipy/pull/12732#issuecomment-674595399)). The only way I've been able to get ASV to work in my conda environment on Windows is to install it with pip. Not good practice, I know, but that's what works. - Sometimes the author of a benchmark wants to time code execution _and_ track something about that code execution without executing that code twice. It sounds like there is not a simple way to do this (see [here](https://github.com/scipy/scipy/pull/10762#issuecomment-567322968)).

If you decide to fork, please consider changing the following behaviors.

  • When a benchmark exceeds the default time limit, ASV reports it as having "FAILED". I think it would be useful to distinguish timeout from other failures. (I also think having a default time limit is surprising.)
  • I think the current default is to silence stderr, so if a test fails, benchmarks need to be re-run to see what went wrong. Should --show-stderr be the default? If not, could it be logged by default?
  • I think the default is to run benchmarks against master instead of HEAD of the current branch. I understand that there are arguments both ways, but defaulting to master seems surprising to me (and at least some others - see BENCH: Run against current HEAD instead of master scipy/scipy#11998).

The rest of these may be due to user error. Nonetheless, they might indicate that there is something to be improved.

  • I use a site.cfg so that BLAS/LAPACK can be found when I build SciPy, but BLAS/LAPACK still can't be found when ASV tries to build SciPy. I get around this by manually copying openblas.a into the ASV environment's Lib folder. Is it possible for ASV to use the information in a project's site.cfg?
  • What is the recommended workflow for debugging and fixing a benchmark that is failing? (Is it to use --quick --show-stderr? Sometimes fixing bugs requires iteration, and even with the --quick option, there is still a long delay before the benchmarks start running. It is often faster for me to manually instantiate the class representing the benchmark and call its methods. That way I can interact with the stack trace and even run in debug mode. Should there be a simpler way to execute a benchmark within the Python prompt?)
  • Perhaps I need to read the documentation more carefully, but I am confused about how many times benchmarks run to generate timing and tracking results. I see from here that "The timing itself is based on the Python standard library’s timeit module, with some extensions for automatic heuristics shamelessly stolen from IPython’s %timeit magic function. This means that in most cases the benchmark function itself will be run many times to achieve accurate timing." Is there a way to control this other than the --quick option, which runs the test only once? Are track_ benchmarks run multiple times, too?

@mroeschke mroeschke added the Admin Administrative tasks related to the pandas project label Dec 27, 2021
@astrojuanlu
Copy link

astrojuanlu commented Jan 25, 2022

Have you folks looked into combining asv with pytest-benchmark? Looks like there have been proposals on both sides ionelmc/pytest-benchmark#208, airspeed-velocity/asv#567 but nothing conclusive. Both projects overlap a lot since both have

  1. A way to write benchmarks,
  2. A way to store benchmark data, track performance, and detect regressions

See also asvdb, explained in this comment:

Our use case is basically what's been described here: we have a large test suite already written with different tools (including pytest-benchmark) and we wanted to take advantage of asv's awesome viewer features without having to rewrite our entire test suite.

@JulianWgs
Copy link

Hello all,

it's great to hear that pandas gets improvements regarding benchmarks. I see that you currently use asv which is a great tool. I just want to leave a link to blog post I read about consistent benchmarking in CI systems.

The author uses Cachegrind to count executed CPU instruction and other metrics to calculate a value which is correlated to the actual execution time. Its the same tool sqlite uses for its benchmarks. The blog post sadly doesn't look into concurrency.

Best regards
Julian

PS:
Netflix posted about detecting performance regression last month.

@datapythonista
Copy link
Member Author

Thanks @JulianWgs for sharing those. Very interesting. Not in scope for the current grant, but very useful for the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Admin Administrative tasks related to the pandas project Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants