Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for categorical where reductions #1237

Merged
merged 3 commits into from
Jun 27, 2023
Merged

Add support for categorical where reductions #1237

merged 3 commits into from
Jun 27, 2023

Conversation

ianthomas23
Copy link
Member

Fixes #1210.

This adds support for categorical where reductions on CPU and GPU, with and without Dask.

An example is

canvas = ds.Canvas(ny, nx)
agg = canvas.points(... agg=ds.by("cat", ds.where(ds.max_n("mass", n=3))))

This returns a 4D xarray.DataArray of shape (ny, nx, ncat, n) containing for each pixel and category the indexes of the 3 rows in the supplied DataFrame that have the maximum values of the "mass" column.

To return the values from another column instead of row indexes this would be

agg = canvas.points(... agg=ds.by("cat", ds.where(ds.max_n("mass", n=3), "other")))

We can replace max_n in this example with max, min, first, last, min_n, first_n, or last_n.

Support is also added for

ds.by("cat", ds.first("value"))

and the last, first_n and last_n equivalents as these are implemented using where under certain circumstances (GPU and/or Dask).

@ianthomas23 ianthomas23 added this to the v0.15.1 milestone Jun 21, 2023
@@ -1793,32 +1818,34 @@ def _build_combine(self, dshape, antialias, cuda, partitioned):
invalid = isminus1 if self.selector.uses_row_index(cuda, partitioned) else isnull

@ngjit
def combine_cpu_2d(aggs, selector_aggs):
ny, nx = aggs[0].shape
def combine_cpu(aggs, selector_aggs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of similar but not quite identical code here that I am planning to refactor in a separate PR.

@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Merging #1237 (38c83f6) into main (5a89820) will decrease coverage by 0.15%.
The diff coverage is 71.59%.

@@            Coverage Diff             @@
##             main    #1237      +/-   ##
==========================================
- Coverage   83.52%   83.37%   -0.15%     
==========================================
  Files          35       35              
  Lines        8778     8832      +54     
==========================================
+ Hits         7332     7364      +32     
- Misses       1446     1468      +22     
Impacted Files Coverage Δ
datashader/reductions.py 77.87% <69.51%> (-0.76%) ⬇️
datashader/compiler.py 88.65% <100.00%> (+0.05%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Is this the end of it? I.e., are there any combinations of by/where/reductions with cpu/gpu/dask that are still unsupported? Or is that entire cross product now covered somewhere?

@ianthomas23
Copy link
Member Author

After rebase tests are failing with some bokeh-panel incompatibility when running examples. That is nothing to do with this PR, so merging this and will deal with example problem separately.

@ianthomas23 ianthomas23 merged commit 9572c88 into holoviz:main Jun 27, 2023
@ianthomas23 ianthomas23 deleted the cat_where branch June 27, 2023 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical support for where and <whatever>_n reductions
2 participants