Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS: Add example how to use xhistogram to split a dataset into regions with a mask? #76

Open
jbusecke opened this issue May 27, 2022 · 2 comments

Comments

@jbusecke
Copy link
Contributor

jbusecke commented May 27, 2022

This is based on a brilliant suggestion by @dcherian, which I have used in multiple projects.

I think it would be very nice to document this little 'trick' to efficiently compute a histogram in 'regions' of an array which are encoded in a numbered mask (this is often how e.g. ocean basins are masked in climate models). Instead of masking out one region at a time with something like this:

for num_region in range(4):
    da_masked = da.where(da.mask==num_region)
    histogram(da_masked, ...)

One can simply add the mask as a second input variable and with appropriate bins, the results will be segmented into the appropriate regions (saving n-1 computations steps; where n is the number of regions).

I wrote up a quick example:

import numpy as np
import xarray as xr
from xhistogram.xarray import histogram

# make a quadrant mask
mask = np.concatenate([
    np.concatenate([np.full((2,2), 1), np.full((2,2), 2)]),
    np.concatenate([np.full((2,2), 3), np.full((2,2), 4)]),
],axis=1)
# create data that has similar values but with some added noise
data = (np.random.rand(*mask.shape)-0.5)*0.5+mask

da = xr.DataArray(data, dims=['x','y'], name='data', coords={'mask':xr.DataArray(mask, dims=['x','y'], name='mask')})
da

image

data_bins = np.linspace(0,5, 50)
mask_bins = np.arange(0.5, 5.5)

hist = histogram(da, da.mask, bins=[data_bins, mask_bins])
hist.plot(col='mask_bin')

image

You can see that this successfully segemented the regions and computed the histogram for each (the data is slightly spread around the actual mask number).

I think this really is a brilliant application case for so many science applications, but it is not obvious untils one tries it (at least it wasnt for me until @dcherian suggested it). So I think it would make a great example for the docs?

Id be happy to add this if ppl think it provides value. Maybe @shanicetbailey is interested in collaborating on this, since this might be very helpful in her current project?

@dcherian
Copy link

image

@jbusecke
Copy link
Contributor Author

THIS will obviously be used!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants