Comprehensive benchmarking suite #4648

dcherian · 2020-12-03T18:01:57Z

scottyhq · 2020-12-03T18:17:13Z

thanks for the ping @dcherian, i really like the idea! One other thing that often gets neglected in test suites is operating on remote data. I understand the need to avoid long-running tests and tests prone to network failures for PRs, but running these sorts of examples as a cron job could be very helpful for benchmarking and detecting issues.

In intake-xarray we recently added tests against a local HTTP server and "S3" server:
https://github.com/intake/intake-xarray/blob/master/intake_xarray/tests/test_remote.py

Also added several simple tests requiring a network connection to public data (no auth required) that we run locally but not in CI currently:
https://github.com/intake/intake-xarray/blob/master/intake_xarray/tests/test_network.py

dcherian · 2020-12-04T19:19:55Z

Thanks @scottyhq

One other thing that often gets neglected in test suites is operating on remote data.

This is lining up with the "pangeo integration tests" that came up in a Pangeo meeting (cc @rabernat).

Regardless whether it fits, I think adding benchmarks+tests for the xarray+zarr+fsspec (or xarray+mfdataset+netCDF) is an important and unmet need of the Pangeo community in general that we could address.

max-sixty · 2020-12-30T19:25:44Z

This would be great.

Down a couple of levels — I think potentially we could run this as a cron job on GitHub Actions. NCAR would also be a good plan. I'm also happy to supply a VM if that's helpful.

dcherian · 2021-08-18T19:24:15Z

Looks like Quansight thinks that GH actions is a good place to benchmark scikit-learn: https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/ so may be we can set that up for our existing benchmarks.

Here's the workflow: https://github.com/jaimergp/scikit-image/blob/main/.github/workflows/benchmarks-cron.yml

dcherian · 2021-11-08T20:54:19Z

@TomAugspurger are you still in charge of the pydata benchmarking machine? If so, could you add xarray to the list please (https://pandas.pydata.org/speed/)? @Illviljan has made major improvements so it should be a lot faster now

TomAugspurger · 2021-11-09T12:17:32Z

"In charge of" is overstating it a bit. It's been segfaulting when building pandas and I haven't had a chance to debug it.

If / when I get around to fixing it I'll try adding xarray, but it might be a bit.

dcherian added the topic-performance label Dec 3, 2020

dcherian mentioned this issue Jan 27, 2021

We need a fast path for open_mfdataset #1823

Closed

rabernat mentioned this issue May 3, 2021

Integration testing requirements pangeo-data/pangeo-integration-tests#1

Open

Illviljan mentioned this issue Sep 14, 2021

Add asv benchmark jobs to CI #5796

Merged

14 tasks

dcherian mentioned this issue Oct 30, 2021

Add groupby & resample benchmarks #5922

Merged

1 task

dcherian mentioned this issue Apr 7, 2023

[skip-ci] Add alignment benchmarks #7738

Merged

dcherian added topic-interpolation contrib-help-wanted and removed topic-interpolation labels Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comprehensive benchmarking suite #4648

Comprehensive benchmarking suite #4648

dcherian commented Dec 3, 2020 •

edited

Loading

scottyhq commented Dec 3, 2020

dcherian commented Dec 4, 2020 •

edited

Loading

max-sixty commented Dec 30, 2020

dcherian commented Aug 18, 2021 •

edited

Loading

dcherian commented Nov 8, 2021

TomAugspurger commented Nov 9, 2021

Comprehensive benchmarking suite #4648

Comprehensive benchmarking suite #4648

Comments

dcherian commented Dec 3, 2020 • edited Loading

scottyhq commented Dec 3, 2020

dcherian commented Dec 4, 2020 • edited Loading

max-sixty commented Dec 30, 2020

dcherian commented Aug 18, 2021 • edited Loading

dcherian commented Nov 8, 2021

TomAugspurger commented Nov 9, 2021

dcherian commented Dec 3, 2020 •

edited

Loading

dcherian commented Dec 4, 2020 •

edited

Loading

dcherian commented Aug 18, 2021 •

edited

Loading