flox performance regression for cftime resampling #7730

mathause · 2023-04-06T09:38:03Z

What happened?

Running an in-memory groupby operation took much longer than expected. Turning off flox fixed this - but I don't think that's the idea ;-)

What did you expect to happen?

flox to be at least on par with our naive implementation

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr

arr = np.random.randn(10, 10, 365*30)
time = xr.date_range("2000", periods=30*365, calendar="noleap")
da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time})

# using max
print("max:")
xr.set_options(use_flox=True)
%timeit da.groupby("time.year").max("time")
%timeit da.groupby("time.year").max("time", engine="flox")

xr.set_options(use_flox=False)
%timeit da.groupby("time.year").max("time")

# as reference
%timeit [da.sel(time=str(year)).max("time") for year in range(2000, 2030)]

# using mean
print("mean:")
xr.set_options(use_flox=True)
%timeit da.groupby("time.year").mean("time")
%timeit da.groupby("time.year").mean("time", engine="flox")

xr.set_options(use_flox=False)
%timeit da.groupby("time.year").mean("time")

# as reference
%timeit [da.sel(time=str(year)).mean("time") for year in range(2000, 2030)]

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

max:
158 ms ± 4.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
28.1 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.5 ms ± 52.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

mean:
95.6 ms ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
34.8 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
15.2 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: f8127fc
python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-69-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1

xarray: main
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.3
pydap: installed
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
iris: 3.4.1
bottleneck: 1.3.7
dask: 2023.3.2
distributed: 2023.3.2.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: 0.2.2
fsspec: 2023.3.0
cupy: None
pint: 0.20.1
sparse: 0.14.0
flox: 0.6.10
numpy_groupies: 0.9.20
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.12.0
sphinx: None

The text was updated successfully, but these errors were encountered:

dcherian · 2023-04-06T12:18:50Z

Thanks can you add version info please

mathause · 2023-04-06T15:36:32Z

Of course - I added it above.

xref pydata/xarray#7730

xref pydata#7730

dcherian · 2023-04-07T02:45:24Z

The slowness is basically a bunch of copies happening in align, broadcast, and transpose. It's made a lot worse for this case, because we take CFTimeIndex and cast it back to CFTimeIndex, repeating all the validity checks.

And then there's xarray-contrib/flox#222

xref pydata#7730

dcherian · 2023-04-07T02:56:29Z

Also because your groups are sorted, engine='flox' is faster

gb = da.groupby("time.year")

# using max
xr.set_options(use_flox=True)
%timeit gb.max("time")
%timeit gb.max("time", engine="flox")

xr.set_options(use_flox=False)
%timeit gb.max("time")

177 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
11.9 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18.5 ms ± 629 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

xref #7730

mathause · 2023-04-12T13:47:19Z

Thanks for looking into this!

* align: Avoid reindexing when join="exact" xref #7730 * Aligner.copy always copies ecen with join="exact" * Move logic to Aligner.align * Add whats-new

dcherian · 2023-04-12T14:43:15Z

Thanks for the report! I think we should add your example as a benchmark.

xref pydata#7730

* [skip-ci] Add cftime groupby, resample benchmarks xref #7730 * [skip-ci]try setting temp dir * [skip-ci] try mamba? * [skip-ci] increase conda verbosity * [skip-ci] specify channels * [skip-ci] Update .github/workflows/benchmarks.yml * [skip-ci] bugfix * [skip-ci] Parameterize use_flox * [skip-ci] cleanup * [skip-ci] fixes * [skip-ci] fix resample parameterizing

* Optimize broadcasting xref pydata/xarray#7730 * reorder * Fix tests * Another optimization * fixes * fix

mathause · 2023-09-06T21:08:04Z

I looked in to this a bit today and I think the performance regression comes from using np.maximum.at and to a lesser account from the datetime conversions, and yes using engine="flox" does speed up things again.

dcherian · 2023-09-07T09:05:58Z

using np.maximum.at

Yes this is a known slowness. numpy did improve some in 1.25 but I'm not sure if our typical workloads are affected. See numpy/numpy#23176 for remaining work.

mathause added bug needs triage Issue that has not been reviewed by xarray team member topic-groupby topic-performance and removed bug labels Apr 6, 2023

dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Apr 6, 2023

dcherian added a commit to xarray-contrib/flox that referenced this issue Apr 6, 2023

Optimize broadcasting

68bbaa7

xref pydata/xarray#7730

dcherian mentioned this issue Apr 6, 2023

Optimize broadcasting xarray-contrib/flox#230

Merged

dcherian added a commit to dcherian/xarray that referenced this issue Apr 7, 2023

align: Avoid reindexing when join="exact"

ee18248

xref pydata#7730

dcherian added a commit to dcherian/xarray that referenced this issue Apr 7, 2023

Avoid recasting a CFTimeIndex

d1049bc

xref pydata#7730

This was referenced Apr 7, 2023

Avoid recasting a CFTimeIndex #7735

Merged

align: Avoid reindexing when join="exact" #7736

Merged

align ignores copy #7737

Open

dcherian added a commit that referenced this issue Apr 11, 2023

Avoid recasting a CFTimeIndex (#7735)

c3224ef

xref #7730

dcherian added a commit that referenced this issue Apr 12, 2023

align: Avoid reindexing when join="exact" (#7736)

67ff171

* align: Avoid reindexing when join="exact" xref #7730 * Aligner.copy always copies ecen with join="exact" * Move logic to Aligner.align * Add whats-new

dcherian changed the title ~~flox performance regression for in-memory da~~ flox performance regression for cftime resampling Apr 12, 2023

dcherian added the topic-cftime label Apr 28, 2023

dcherian added a commit to dcherian/xarray that referenced this issue Apr 28, 2023

Add cftime groupby, resample benchmarks

f0accfc

xref pydata#7730

dcherian added a commit to dcherian/xarray that referenced this issue Apr 28, 2023

[skip-ci] Add cftime groupby, resample benchmarks

79695e5

xref pydata#7730

dcherian mentioned this issue Apr 28, 2023

[skip-ci] Add cftime groupby, resample benchmarks #7795

Merged

dcherian added a commit to xarray-contrib/flox that referenced this issue May 4, 2023

Optimize broadcasting (#230)

622ddb2

* Optimize broadcasting xref pydata/xarray#7730 * reorder * Fix tests * Another optimization * fixes * fix

mathause mentioned this issue Sep 6, 2023

use engine flox if array is ordered? xarray-contrib/flox#262

Closed

mathause mentioned this issue Sep 29, 2023

use engine flox for ordered groups xarray-contrib/flox#266

Merged

dcherian closed this as completed in xarray-contrib/flox#266 Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flox performance regression for cftime resampling #7730

flox performance regression for cftime resampling #7730

mathause commented Apr 6, 2023 •

edited

Loading

INSTALLED VERSIONS

dcherian commented Apr 6, 2023

mathause commented Apr 6, 2023

dcherian commented Apr 7, 2023 •

edited

Loading

dcherian commented Apr 7, 2023

mathause commented Apr 12, 2023

dcherian commented Apr 12, 2023

mathause commented Sep 6, 2023

dcherian commented Sep 7, 2023

flox performance regression for cftime resampling #7730

flox performance regression for cftime resampling #7730

Comments

mathause commented Apr 6, 2023 • edited Loading

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

dcherian commented Apr 6, 2023

mathause commented Apr 6, 2023

dcherian commented Apr 7, 2023 • edited Loading

dcherian commented Apr 7, 2023

mathause commented Apr 12, 2023

dcherian commented Apr 12, 2023

mathause commented Sep 6, 2023

dcherian commented Sep 7, 2023

mathause commented Apr 6, 2023 •

edited

Loading

dcherian commented Apr 7, 2023 •

edited

Loading