-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flox seems much slower in some cases? #281
Labels
bug
Something isn't working
Comments
autoguessing is setting Pass I'm not sure why the augtoguessed |
I was wrong, the numbagg bit was quite buggy but hopefully #282 fixes all of it. It would be nice to add a logger to log the autoguessing |
dcherian
added a commit
that referenced
this issue
Nov 8, 2023
This majorly improves the dim=... case for engine="flox" at least. xref #281 I'm not sure if it is a regression for engine="numpy" We trade off a single bad reshape for array against argsorting both array and group_idx for a ~10-20x speedup ``` ds = xr.tutorial.load_dataset('air_temperature') ds.groupby('lon').count(..., engine="flox") ```
dcherian
added a commit
that referenced
this issue
Nov 8, 2023
* Set order='F' when raveling group_idx after broadcast This majorly improves the dim=... case for engine="flox" at least. xref #281 I'm not sure if it is a regression for engine="numpy" We trade off a single bad reshape for array against argsorting both array and group_idx for a ~10-20x speedup ``` ds = xr.tutorial.load_dataset('air_temperature') ds.groupby('lon').count(..., engine="flox") ``` * This is an improvement only for engine=flox * Update tests * Fix benchmark * type ignore
dcherian
added a commit
that referenced
this issue
Nov 9, 2023
* Fix numbagg version check Closes #281 * Enable numbagg for count * Better numbagg special-casing * Fixes. * A bunch of typing * Handle fill_value in core numbagg reduction. * Update flox/aggregate_numbagg.py * cleanup * [WIP] test hacky fix * [wip] * Cleanup functions * Fix casting * Fix fill_value masking * optimize * Update flox/aggregations.py * Small cleanup * Fix. * Fix typing * Another bugfix * Optimize seen_groups * Be careful about raveling * Fix benchmark skipping for numbagg * add test
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Not sure if this belongs in xarray or here. And it's probably caused by me doing something wrong (or have messed up something in numbagg!).
It's motivated by a larger example that runs on dask and takes 128s vs. 8s.
Here's a smaller MCVE:
INSTALLED VERSIONS
commit: 37325848799e75d7b1a73a0e45e2cf4bd376ccaa
python: 3.9.18 (main, Aug 24 2023, 21:19:58)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None
xarray: 2023.9.1.dev10+gff14c6ba.d20230930
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.5.0
distributed: 2023.5.0
matplotlib: 3.6.0
cartopy: None
seaborn: 0.12.2
numbagg: 0.6.0
fsspec: 2022.8.2
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.9.22
setuptools: 68.1.2
pip: 23.2.1
conda: None
pytest: 7.4.0
mypy: 1.6.1
IPython: 8.14.0
sphinx: 5.2.1
The text was updated successfully, but these errors were encountered: