Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: benchmark build is taking a long time #44450

Open
jreback opened this issue Nov 14, 2021 · 9 comments
Open

CI: benchmark build is taking a long time #44450

jreback opened this issue Nov 14, 2021 · 9 comments
Assignees
Labels
Benchmark Performance (ASV) benchmarks CI Continuous Integration

Comments

@jreback
Copy link
Contributor

jreback commented Nov 14, 2021

xref https://github.com/pandas-dev/pandas/runs/4204157179?check_suite_focus=true

have seen this on multiple PRs. Maybe something recently added causes this to timeout.

@jreback jreback added the CI Continuous Integration label Nov 14, 2021
@jreback jreback added this to the 1.4 milestone Nov 14, 2021
@jreback
Copy link
Contributor Author

jreback commented Nov 14, 2021

cc @pandas-dev/pandas-core

@alimcmaster1
Copy link
Member

Could be related to #44359

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 15, 2021

I think this is mostly caused by the continuous expansion of our suite, with some additions recently that had a big impact on total runtime.
While it's of course good to to keep increasing the coverage of our benchmark suite, we should also be careful about keeping the total runtime manageable by keeping parametrizations limited where they are not adding much value / by ensuring the individual time functions don't take too much time.

I ran the benchmarks locally with the --quick options (so without repetitions, as we do on our CI) and did a quick analysis of the results (ASV saves those in a json file, which I am trying to upload as artifact on our CI in #44464).

The code for this and some results can be seen at https://nbviewer.org/gist/jorisvandenbossche/0af1c0a20ef187197ecdcfdb3545306a

Based on those results, some of the very slow ones that contribute a lot to the total runtime of the benchmarks:

(the first two items are by far adding the most time of the whole suite)

@jreback
Copy link
Contributor Author

jreback commented Nov 15, 2021

going to close this based on #44475, but will open a new issue about separating the benchmarks to another build.

@jbrockmendel
Copy link
Member

i expect joris's diagnosis is correct and trimming the groupby parameterizations will go a long way towards fixing this.

for e.g. ncols param that often has cases with [1, 2, 5, 10], might find a way to only do the ncols=1, ncols=2 cases on the CI?

@jorisvandenbossche
Copy link
Member

The rolling benchmarks were not the biggest "offender", so let's keep this open for at least the other two I listed.

for e.g. ncols param that often has cases with [1, 2, 5, 10], might find a way to only do the ncols=1, ncols=2 cases on the CI?

Or just skip ncols > 2 always for at least describe? (might need to check if there are other ones that are much slower)
Or does the 5 instead of 2 columns add a lot of additional information? (with 2 we already have a case of "not a single column")) This should certainly be sufficient to catch regressions in the describe algo in general I think?

@jbrockmendel
Copy link
Member

seems reasonable

@jorisvandenbossche
Copy link
Member

@jbrockmendel the benchmark function taking the most overall time is now tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr (after groupby.String.time_str_func, which is already mentioned above). The main aspect contributing to this overall time is the huge parametrization (450 combinations, for benchmarking a single function).
For example, the benchmark runs for 5 different sizes of the data ([0, 1, 100, 10 ** 4, 10 ** 6]). Is there any reason to think varying the size might provide useful information in this specific case? Could we also run this just for a single size?
Also, it is now run for several timezones and several freqs. Those are both useful parametrizations (affecting the time), but we could also test them independently? (all different timezones with one fixed freq, and all different freqs with one fixed tz, instead of the full product of combinations)

@jbrockmendel
Copy link
Member

Is there any reason to think varying the size might provide useful information in this specific case? Could we also run this just for a single size?

I think the runtime is expected to be affine, so in principle we'd want/need 2 sizes.

More generally, the tslibs asvs are written so that they can be skipped for PRs that don't touch tslibs (ditto the benchmarks/libs.py).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks CI Continuous Integration
Projects
None yet
Development

No branches or pull requests

6 participants