-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouping is significantly slower when adding auxiliary coordinates to the time dimension #9426
Closed
5 tasks done
Labels
Comments
14 tasks
EDIT: welp, now I can. |
As an aside, why not use |
dcherian
added a commit
to dcherian/xarray
that referenced
this issue
Sep 4, 2024
dcherian
added a commit
to dcherian/xarray
that referenced
this issue
Sep 4, 2024
3 tasks
Turns out deep-copying cftime object is quite slow, and we don't need the deep-copy anyway |
dcherian
added a commit
that referenced
this issue
Sep 5, 2024
hollymandel
pushed a commit
to hollymandel/xarray
that referenced
this issue
Sep 23, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
In the temporal APIs of the xCDAT package, we generate "labeled" time coordinates which allow grouping across multiple time components (e.g., "year", "month", etc.) since Xarray does not currently support this feature. These labeled time coordinates are added to the existing time dimension, then we use Xarray's
.groupby()
method to group on these "labeled" time coordinates.However, I noticed
.groupby()
is much slower after adding these auxiliary coordinates to the time dimension. This performance slowdown also affects grouping on the original time dimension coordinates, which I don't think should happen.In the MVCE, I generate a dummy dataset with a shape of (10000, 180, 360).
time.month
without auxiliary coords: 0.0119 secondstime.month
with auxiliary coords: 0.4892 secondsNote, as the dataset size increases, the slowdown is much more prominent.
What did you expect to happen?
Xarray's
groupby()
should not slow down after adding auxiliary time coordinates to the time dimension (I think?).Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
Related to PR #689 in the xCDAT repo.
Our workaround was to replace the dimension coordinates with the auxiliary coordinates for grouping purposes, which sped up grouping significantly.
Environment
xarray: 2024.7.0
pandas: 2.2.2
numpy: 2.1.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 73.0.1
pip: 24.2
conda: None
pytest: None
mypy: None
IPython: 8.27.0
sphinx: None
The text was updated successfully, but these errors were encountered: