Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flox performance regression for cftime resampling #7730

Closed
4 tasks
mathause opened this issue Apr 6, 2023 · 8 comments · Fixed by xarray-contrib/flox#266
Closed
4 tasks

flox performance regression for cftime resampling #7730

mathause opened this issue Apr 6, 2023 · 8 comments · Fixed by xarray-contrib/flox#266

Comments

@mathause
Copy link
Collaborator

mathause commented Apr 6, 2023

What happened?

Running an in-memory groupby operation took much longer than expected. Turning off flox fixed this - but I don't think that's the idea ;-)

What did you expect to happen?

flox to be at least on par with our naive implementation

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr

arr = np.random.randn(10, 10, 365*30)
time = xr.date_range("2000", periods=30*365, calendar="noleap")
da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time})

# using max
print("max:")
xr.set_options(use_flox=True)
%timeit da.groupby("time.year").max("time")
%timeit da.groupby("time.year").max("time", engine="flox")

xr.set_options(use_flox=False)
%timeit da.groupby("time.year").max("time")

# as reference
%timeit [da.sel(time=str(year)).max("time") for year in range(2000, 2030)]

# using mean
print("mean:")
xr.set_options(use_flox=True)
%timeit da.groupby("time.year").mean("time")
%timeit da.groupby("time.year").mean("time", engine="flox")

xr.set_options(use_flox=False)
%timeit da.groupby("time.year").mean("time")

# as reference
%timeit [da.sel(time=str(year)).mean("time") for year in range(2000, 2030)]

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

max:
158 ms ± 4.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
28.1 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.5 ms ± 52.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

mean:
95.6 ms ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
34.8 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
15.2 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: f8127fc
python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-69-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1

xarray: main
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.3
pydap: installed
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
iris: 3.4.1
bottleneck: 1.3.7
dask: 2023.3.2
distributed: 2023.3.2.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: 0.2.2
fsspec: 2023.3.0
cupy: None
pint: 0.20.1
sparse: 0.14.0
flox: 0.6.10
numpy_groupies: 0.9.20
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.12.0
sphinx: None

@mathause mathause added bug needs triage Issue that has not been reviewed by xarray team member topic-groupby topic-performance and removed bug labels Apr 6, 2023
@dcherian
Copy link
Contributor

dcherian commented Apr 6, 2023

Thanks can you add version info please

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Apr 6, 2023
@mathause
Copy link
Collaborator Author

mathause commented Apr 6, 2023

Of course - I added it above.

dcherian added a commit to xarray-contrib/flox that referenced this issue Apr 6, 2023
dcherian added a commit to dcherian/xarray that referenced this issue Apr 7, 2023
@dcherian
Copy link
Contributor

dcherian commented Apr 7, 2023

The slowness is basically a bunch of copies happening in align, broadcast, and transpose. It's made a lot worse for this case, because we take CFTimeIndex and cast it back to CFTimeIndex, repeating all the validity checks.

And then there's xarray-contrib/flox#222

dcherian added a commit to dcherian/xarray that referenced this issue Apr 7, 2023
@dcherian
Copy link
Contributor

dcherian commented Apr 7, 2023

Also because your groups are sorted, engine='flox' is faster

gb = da.groupby("time.year")

# using max
xr.set_options(use_flox=True)
%timeit gb.max("time")
%timeit gb.max("time", engine="flox")

xr.set_options(use_flox=False)
%timeit gb.max("time")
177 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
11.9 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18.5 ms ± 629 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

dcherian added a commit that referenced this issue Apr 11, 2023
@mathause
Copy link
Collaborator Author

Thanks for looking into this!

dcherian added a commit that referenced this issue Apr 12, 2023
* align: Avoid reindexing when join="exact"

xref #7730

* Aligner.copy always copies ecen with join="exact"

* Move logic to Aligner.align

* Add whats-new
@dcherian dcherian changed the title flox performance regression for in-memory da flox performance regression for cftime resampling Apr 12, 2023
@dcherian
Copy link
Contributor

Thanks for the report! I think we should add your example as a benchmark.

dcherian added a commit to dcherian/xarray that referenced this issue Apr 28, 2023
dcherian added a commit to dcherian/xarray that referenced this issue Apr 28, 2023
dcherian added a commit that referenced this issue May 2, 2023
* [skip-ci] Add cftime groupby, resample benchmarks

xref #7730

* [skip-ci]try setting temp dir

* [skip-ci] try mamba?

* [skip-ci] increase conda verbosity

* [skip-ci] specify channels

* [skip-ci] Update .github/workflows/benchmarks.yml

* [skip-ci] bugfix

* [skip-ci] Parameterize use_flox

* [skip-ci] cleanup

* [skip-ci] fixes

* [skip-ci] fix resample parameterizing
dcherian added a commit to xarray-contrib/flox that referenced this issue May 4, 2023
* Optimize broadcasting

xref pydata/xarray#7730

* reorder

* Fix tests

* Another optimization

* fixes

* fix
@mathause
Copy link
Collaborator Author

mathause commented Sep 6, 2023

I looked in to this a bit today and I think the performance regression comes from using np.maximum.at and to a lesser account from the datetime conversions, and yes using engine="flox" does speed up things again.

@dcherian
Copy link
Contributor

dcherian commented Sep 7, 2023

using np.maximum.at

Yes this is a known slowness. numpy did improve some in 1.25 but I'm not sure if our typical workloads are affected. See numpy/numpy#23176 for remaining work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants