Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable flox in GroupBy and resample #5734

Closed
Closed
Show file tree
Hide file tree
Changes from 180 commits
Commits
Show all changes
181 commits
Select commit Hold shift + click to select a range
c486df7
Move `_reduce_method` classmethod to `groupby.py` module
andersy005 Aug 12, 2021
ef91e6e
Add _numpy_groupies module
andersy005 Aug 13, 2021
511dd44
Add more aggregations
andersy005 Aug 13, 2021
3ee6200
Remove comments
andersy005 Aug 13, 2021
f088392
Fix position keyword arguments
andersy005 Aug 13, 2021
35944e4
Remove `_numpy_groupies.py` module
andersy005 Aug 13, 2021
9be0228
Merge branch 'pydata:main' into groupby-aggs-using-numpy-groupies
andersy005 Aug 24, 2021
e6bcce9
some fixes
dcherian Aug 24, 2021
4702c9d
Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:anders…
dcherian Aug 24, 2021
489b2ff
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 Sep 26, 2021
cdf7612
Fix resample test
dcherian Oct 3, 2021
af4cc5d
Fix reduce methods
dcherian Oct 3, 2021
58c1c6b
Add _dask_groupby_kwargs
dcherian Oct 3, 2021
69fd563
Avoid forwarding DummyGroup objects
dcherian Oct 3, 2021
b1e3ab2
Raise error when reducing along indexed dimensions with squeeze=True
dcherian Oct 3, 2021
462e61b
Don't pass numeric_only to DataArray.reduce
dcherian Oct 4, 2021
1d9a360
Add CI for now
dcherian Oct 4, 2021
f4748ee
typo
dcherian Oct 4, 2021
b97ffcb
Fix windows env
dcherian Oct 4, 2021
9b44db9
Fix keep_attrs test
dcherian Oct 4, 2021
262a3f5
Update ci/requirements/environment-windows.yml
dcherian Oct 4, 2021
e3b3a00
Fix resampling
dcherian Oct 5, 2021
4b25db5
Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:anders…
dcherian Oct 5, 2021
0f2c59f
Merge branch 'main' of github.com:pydata/xarray into groupby-aggs-usi…
andersy005 Oct 26, 2021
77f0e0e
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Nov 5, 2021
faee02c
fix env stuff + remove env var
dcherian Nov 5, 2021
3608e9f
get working again
dcherian Nov 5, 2021
ad25f78
Add to asv env
dcherian Nov 6, 2021
932b9a5
Separate out median
dcherian Nov 6, 2021
ac85e72
make dask_groupby actually optional
dcherian Nov 6, 2021
d238459
any,all
dcherian Nov 6, 2021
a2168df
typo again
dcherian Nov 6, 2021
6b9a81a
Better generator for reductions.
dcherian Nov 8, 2021
569c67f
Add ddof for var, std
dcherian Nov 8, 2021
816e794
Generate DataArray, Dataset reductions too.
dcherian Nov 8, 2021
a04ed82
Small changes
dcherian Nov 8, 2021
7f39cc0
Minor docstring improvements.
dcherian Nov 8, 2021
99bfe12
Fixes #5898
dcherian Nov 8, 2021
08911b9
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Nov 8, 2021
9bb2c32
Reorder docstring to match numpy
dcherian Nov 8, 2021
dea8fd9
REfactor
dcherian Nov 8, 2021
f06e6a7
Revert "Separate out median"
dcherian Nov 8, 2021
0661c1b
Refactored generator
dcherian Nov 8, 2021
0c35c0c
Reimplemented
dcherian Nov 8, 2021
3e08964
Add benchmarks
dcherian Nov 8, 2021
583187a
Fix benchmark to not groupby chunked variables.
dcherian Nov 9, 2021
4ef53db
Start supporting ndim groups
dcherian Nov 9, 2021
6afb3bf
WIP refactor init
dcherian Nov 9, 2021
35af40a
Revert "WIP refactor init"
dcherian Nov 9, 2021
c9a82b3
Revert "Start supporting ndim groups"
dcherian Nov 9, 2021
0ac5498
Avoid stacking by default
dcherian Nov 9, 2021
b9bc1dd
Update reductions
dcherian Nov 10, 2021
bece14e
Fix median and add test.
dcherian Nov 10, 2021
41f0aa5
fix test
dcherian Nov 10, 2021
0559ee1
Fix var, std doctests
dcherian Nov 10, 2021
31e1fd2
Force test failure to check CI env
dcherian Nov 10, 2021
47b593c
Use conda-forge numpy_groupies in CI
dcherian Nov 10, 2021
c7e9d96
Minor improvement
dcherian Nov 10, 2021
77d2665
Revert "Force test failure to check CI env"
dcherian Nov 10, 2021
11c3d33
Fixed doctests in dask_groupby
dcherian Nov 10, 2021
be53f13
See if its an import error
dcherian Nov 10, 2021
35908b5
Revert "See if its an import error"
dcherian Nov 10, 2021
9c2cbb8
Ppass through objects with only numpy or dask arrays
dcherian Nov 11, 2021
e9af57c
Try fixing mypy
dcherian Nov 11, 2021
415eb29
Fix bug when binning by nD variable.
dcherian Nov 12, 2021
edbd376
Fix binning and weird issues with precision and pd.cut
dcherian Nov 14, 2021
c189eea
Fix upsampling with resample
dcherian Nov 14, 2021
43ade8c
"blockwise" need not be the best strategy for resample..
dcherian Nov 15, 2021
553735e
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Nov 15, 2021
03b7b31
one more bugfix
dcherian Nov 15, 2021
e038cc7
Fix dimension order when binning a dimension coordinate
dcherian Nov 15, 2021
1f370f6
silence warning
dcherian Nov 15, 2021
7375dd4
fix test.
dcherian Nov 15, 2021
860f7be
add extra test
dcherian Nov 15, 2021
ced9034
Update upstream-dev env
dcherian Nov 15, 2021
b269439
[test-upstream] Revert setting npg option in benchmarks
dcherian Nov 16, 2021
cc8abfe
[test-upstream] Rename to flox
dcherian Nov 16, 2021
033f5b5
Add to print_versions
dcherian Nov 18, 2021
bd24db4
Add to all-but-dask
dcherian Nov 18, 2021
098467d
Force failure to make sure CI is working.
dcherian Nov 18, 2021
a282ad4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 18, 2021
8f23310
Revert "Force failure to make sure CI is working."
dcherian Nov 19, 2021
5dcb5bf
Attempt fixing typing errors
Illviljan Nov 20, 2021
411d75d
Now get normal code running as well
Illviljan Nov 20, 2021
6a9a124
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2021
dd28a57
updates
dcherian Nov 20, 2021
f03b675
Merge branch 'main' into pr/5950
Illviljan Nov 20, 2021
dfbe103
Merge branch 'generate-reductions-class' of https://github.com/dcheri…
Illviljan Nov 20, 2021
2bbddaf
make reduce args consistent
Illviljan Nov 20, 2021
3d854e5
more reduce edits
Illviljan Nov 20, 2021
be33560
one more reduce
Illviljan Nov 20, 2021
0f94bec
another reduce
Illviljan Nov 20, 2021
19d82cd
more reduce
Illviljan Nov 20, 2021
cd8a898
add doctests
dcherian Nov 20, 2021
4f378a3
Bugfix DataArray resampling.
dcherian Nov 22, 2021
6916fa7
Update xarray/util/generate_reductions.py
dcherian Nov 22, 2021
af03ca4
Small improvement to resampling
dcherian Nov 26, 2021
cfd2c07
minimize conflicts
dcherian Nov 26, 2021
3c51b1a
Squash merge #5950
dcherian Nov 26, 2021
2a1b12f
Update xarray/util/generate_reductions.py
dcherian Nov 26, 2021
638d98a
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Nov 26, 2021
45feeab
Annotate some reduction tests.
dcherian Nov 26, 2021
b406789
Merge remote-tracking branch 'upstream/main' into generate-reductions…
dcherian Nov 26, 2021
1875fd2
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 Dec 16, 2021
66151f6
Merge branch 'main' into pr/5950
Illviljan Dec 21, 2021
3dc94ae
force keyword args after dim
Illviljan Dec 21, 2021
bc55db3
Write to file using open() instead.
Illviljan Dec 21, 2021
b78df18
Update _reductions.py
Illviljan Dec 21, 2021
16372a5
Merge branch 'generate-reductions-class' of https://github.com/dcheri…
Illviljan Dec 21, 2021
74064b9
manual tweaks to make ci happy
Illviljan Dec 21, 2021
8336c53
Merge branch 'main' into pr/5950
Illviljan Dec 27, 2021
ad6b5bc
Merge branch 'main' into groupby-aggs-using-numpy-groupies
dcherian Dec 28, 2021
e1ba8a2
use_numpy_groupies → use_flox
dcherian Dec 28, 2021
bdb999f
fix tests
dcherian Dec 29, 2021
4fb17b1
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Dec 29, 2021
70266e1
fix tests
dcherian Dec 29, 2021
41e43fe
fix tests
dcherian Dec 30, 2021
2c2e7dc
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Jan 12, 2022
c157fca
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Jan 13, 2022
3f3a197
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 Jan 13, 2022
cd51a15
Merge branch 'main' into generate-reductions-class
dcherian Feb 16, 2022
7b34077
Fix path
dcherian Feb 16, 2022
9799d87
Apply suggestions from code review
dcherian Mar 8, 2022
1fcd080
Fixes
dcherian Mar 8, 2022
ebe9985
Merge branch 'main' into generate-reductions-class
dcherian Mar 9, 2022
d5f627c
update _reductions
dcherian Mar 9, 2022
e348c76
Merge branch 'main' into groupby-aggs-using-numpy-groupies
dcherian Mar 9, 2022
434db03
polish
dcherian Mar 10, 2022
62474a8
Merge branch 'generate-reductions-class' into groupby-aggs-using-nump…
dcherian Mar 10, 2022
94bcb32
Merge branch 'main' into groupby-aggs-using-numpy-groupies
dcherian Mar 13, 2022
a1769ba
polish
dcherian Mar 13, 2022
26d85d5
loooser test
dcherian Mar 13, 2022
705b3f0
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian Mar 13, 2022
e412583
Fix.
dcherian Mar 13, 2022
2694dbe
Test flox kwargs
dcherian Mar 13, 2022
1a91802
Merge branch 'main' into groupby-aggs-using-numpy-groupies
dcherian Mar 29, 2022
87f94ba
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 Apr 7, 2022
fd6aa17
fix
dcherian Mar 30, 2022
c176f8d
Test cleanup
dcherian Mar 30, 2022
9d4ee11
[skip-ci] Apply suggestions from code review
dcherian Apr 10, 2022
4dd9e66
Update envs
dcherian Apr 10, 2022
812ce33
[skip-ci] Apply suggestions from code review
dcherian Apr 10, 2022
628406c
Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:anders…
dcherian Apr 10, 2022
158314a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 10, 2022
3580ae3
fix
dcherian Apr 10, 2022
da31c4f
Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:anders…
dcherian Apr 10, 2022
d613779
fix
dcherian Apr 10, 2022
d2510c0
Merge branch 'main' into groupby-aggs-using-numpy-groupies
dcherian Apr 13, 2022
d0a412a
Update ci/requirements/environment-windows.yml
dcherian Apr 13, 2022
5627277
Merge branch 'main' into groupby-aggs-using-numpy-groupies
dcherian Apr 24, 2022
3a7052e
Support numeric_only
dcherian Apr 24, 2022
fcef26f
Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:anders…
dcherian Apr 24, 2022
eae37e2
Properly support numeric_only
dcherian Apr 26, 2022
5583e34
Set default to "split-reduce" to reduce surprises
dcherian Apr 26, 2022
5337bd4
Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-…
dcherian May 2, 2022
2d1de0f
Add flox to min_all_deps
dcherian May 2, 2022
7d9b470
Update ci/requirements/min-all-deps.yml
dcherian May 3, 2022
7dab730
[skip-ci] add whats-new
dcherian May 3, 2022
6902de3
Better defaults for resample
dcherian May 3, 2022
b2b3001
[skip-ci] Fix whats-new.
dcherian May 3, 2022
36c206e
Clean up resampling.
dcherian May 4, 2022
7869ad5
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 May 5, 2022
7a58590
Test adding back dummy methods.
Illviljan May 5, 2022
687beac
Update _reductions.py
Illviljan May 5, 2022
4705b6c
Update _reductions.py
Illviljan May 5, 2022
ac49bfa
Update resample.py
Illviljan May 5, 2022
444feee
Try subclassing to ResampleBase-classes
Illviljan May 5, 2022
1ac3281
Merge branch 'main' into pr/5734
Illviljan May 5, 2022
4f7ef6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 5, 2022
3282809
Copy/paste instead of a for loop
Illviljan May 5, 2022
2ee1de4
Merge branch 'groupby-aggs-using-numpy-groupies' of https://github.co…
Illviljan May 5, 2022
4a384fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 5, 2022
c38ef78
Update resample.py
Illviljan May 5, 2022
33f70da
Merge branch 'groupby-aggs-using-numpy-groupies' of https://github.co…
Illviljan May 5, 2022
d711d58
Merge branch 'main' into pr/5734
Illviljan May 9, 2022
67cda8a
Ignore typing when flox is not available
Illviljan May 9, 2022
fd20ba2
Update whats-new
dcherian May 10, 2022
ad33d85
Deduplicate
dcherian May 10, 2022
3ab03ee
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 May 11, 2022
2e3dca8
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 May 12, 2022
6573e4b
Merge branch 'main' into groupby-aggs-using-numpy-groupies
andersy005 May 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@
"bottleneck": [""],
"dask": [""],
"distributed": [""],
"flox": [""],
"numpy_groupies": [""],
"sparse": [""]
},

Expand Down
10 changes: 6 additions & 4 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ def setup(self, *args, **kwargs):
{
"a": xr.DataArray(np.r_[np.repeat(1, self.n), np.repeat(2, self.n)]),
"b": xr.DataArray(np.arange(2 * self.n)),
"c": xr.DataArray(np.arange(2 * self.n)),
}
)
self.ds2d = self.ds1d.expand_dims(z=10)
Expand Down Expand Up @@ -50,10 +51,11 @@ class GroupByDask(GroupBy):
def setup(self, *args, **kwargs):
requires_dask()
super().setup(**kwargs)
self.ds1d = self.ds1d.sel(dim_0=slice(None, None, 2)).chunk({"dim_0": 50})
self.ds2d = self.ds2d.sel(dim_0=slice(None, None, 2)).chunk(
{"dim_0": 50, "z": 5}
)

self.ds1d = self.ds1d.sel(dim_0=slice(None, None, 2))
self.ds1d["c"] = self.ds1d["c"].chunk({"dim_0": 50})
self.ds2d = self.ds2d.sel(dim_0=slice(None, None, 2))
self.ds2d["c"] = self.ds2d["c"].chunk({"dim_0": 50, "z": 5})
self.ds1d_mean = self.ds1d.groupby("b").mean()
self.ds2d_mean = self.ds2d.groupby("b").mean()

Expand Down
2 changes: 2 additions & 0 deletions ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ conda uninstall -y --force \
pint \
bottleneck \
sparse \
flox \
h5netcdf \
xarray
# to limit the runtime of Upstream CI
Expand Down Expand Up @@ -47,4 +48,5 @@ python -m pip install \
git+https://github.com/pydata/sparse \
git+https://github.com/intake/filesystem_spec \
git+https://github.com/SciTools/nc-time-axis \
git+https://github.com/dcherian/flox \
git+https://github.com/h5netcdf/h5netcdf
1 change: 1 addition & 0 deletions ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ dependencies:
- cfgrib
- cftime
- coveralls
- flox
- h5netcdf
- h5py
- hdf5
Expand Down
1 change: 1 addition & 0 deletions ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ dependencies:
- cftime
- dask-core
- distributed
- flox
- fsspec!=2021.7.0
- h5netcdf
- h5py
Expand Down
1 change: 1 addition & 0 deletions ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- cftime
- dask-core
- distributed
- flox
- fsspec!=2021.7.0
- h5netcdf
- h5py
Expand Down
1 change: 1 addition & 0 deletions ci/requirements/min-all-deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies:
- coveralls
- dask-core=2021.04
- distributed=2021.04
- flox=0.5
- h5netcdf=0.11
- h5py=3.1
# hdf5 1.12 conflicts with h5py=3.1
Expand Down
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,11 @@ Performance
- GroupBy binary operations are now vectorized.
Previously this involved looping over all groups. (:issue:`5804`,:pull:`6160`)
By `Deepak Cherian <https://github.com/dcherian>`_.
- Substantially improved GroupBy operations using `flox <https://flox.readthedocs.io/en/latest/>`_.
This is auto-enabled when ``flox`` is installed. Use ``xr.set_options(use_flox=False)`` to use
the old algorithm. (:issue:`4473`, :issue:`4498`, :issue:`659`, :issue:`2237`, :pull:`271`).
By `Deepak Cherian <https://github.com/dcherian>`_,`Anderson Banihirwe <https://github.com/andersy005>`_,
`Jimmy Westling <https://github.com/illviljan>`_.

Internal Changes
~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ accel =
scipy
bottleneck
numbagg
flox

parallel =
dask[complete]
Expand Down
Loading