-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added datashader regridding operation #1773
Conversation
@jlstevens @jbednar I've now finished the code changes to this PR leaving decisions on naming and docstrings. I've included various changes to the The main naming decisions to make are about the |
@jlstevens Any idea how I can make the raster regridding unit tests dependent on a particular version of datashader? They're failing now because it's still pulling datashader 0.5. |
You could try doing a version check in the test class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; mainly commenting on docstrings.
holoviews/operation/datashader.py
Outdated
|
||
dynamic = param.Boolean(default=True, doc=""" | ||
Enables dynamic processing by default.""") | ||
|
||
expand = param.Boolean(default=True, doc=""" | ||
Whether the x_range and y_range should be allowed to expand | ||
beyond the extent of the data.""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this docstring could be, ahem, expanded, with some of the pros and cons. My guess is:
Whether the x_range and y_range should be allowed to expand
beyond the extent of the data. Setting this value to True is useful
for the case where you want to ensure a certain size of output
grid, e.g. if you are doing masking or other arithmetic on the
grids. A value of False ensures that the grid is only just as
large as it needs to be to contain the data, which will be faster
and use less memory if the resulting aggregate is being overlaid
on a much larger background.""")
If that description is accurate, maybe it should be False by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is accurate. I agree that making it False
might be useful as that's usually what you want for visualization purposes, True
is mostly needed when you want to match grid sizes for computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, False then; HV should be optimized for viz, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there's a good reason not to enable it for aggregate
. Computing the actual range is expensive when you have to iterate over all the datapoints and it doesn't actually gain you anything except making sure that pixels aren't wasted on regions where there is no data. I could set it to True
for regrid only, but it might be better to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be True for regrid only; gridded data has very clear bounds and it's probably more confusing if we don't respect those, and definitely slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and definitely slower.
Not necessarily true, it just changes whether the pixels are crammed into the bounds of the image or whether they're wasted on the area beyond the original image bounds. If you're using first
/last
aggregators that's true though.
holoviews/operation/datashader.py
Outdated
By default, the link_inputs parameter is set to True so that | ||
when applying shade, backends that support linked streams | ||
update RangeXY streams on the inputs of the shade operation.""") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe expand to say why one might want to set it to False? E.g. if you reuse objects in different cells of a notebook and find them inappropriately linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Jim's suggestion here. Doesn't look like it has been updated yet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we settled for ResamplingOperation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's been updated below; maybe it was supposed to be done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, meant to add it everywhere.
holoviews/operation/datashader.py
Outdated
the x_range and y_range. If x_sampling or y_sampling are supplied | ||
the operation will ensure that a bin is no smaller than the minimum | ||
sampling distance by reducing the width and height when the zoomed | ||
in beyond the minimum sampling distance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when zoomed
else: | ||
layers = {} | ||
for c in agg.coords[column].data: | ||
cagg = agg.sel(**{column: c}) | ||
layers[c] = self.p.element_type((xs, ys, cagg.data), **params) | ||
eldata = cagg if ds_version > '0.5.0' else (xs, ys, cagg.data) | ||
layers[c] = self.p.element_type(eldata, **params) | ||
return NdOverlay(layers, kdims=[data.get_dimension(column)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully once there is a new ds release we can remove this bit and simply require ds>=0.6.0.
holoviews/operation/datashader.py
Outdated
|
||
upsample = param.Boolean(default=True, doc=""" | ||
Whether to allow upsampling if the source array is smaller than | ||
the requested array.""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be False by default since interpolation is nearest by default, and thus the result will be approximately the same whether or not upsampling is done? With linear interpolation the results will be different, but one could say in the docstring for the interpolation parameter that one might want upsample=False for interpolation methods that smooth the data, unlike nearest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd agree with that, upsampling is rarely needed I think, again only when you're trying to match a higher resolution target gridding (in which case you should probably downsample the higher resolution grid rather than upsampling the lower resolution one).
I'm fine with those names. |
holoviews/operation/datashader.py
Outdated
@@ -133,7 +139,7 @@ class aggregate(resample_operation): | |||
the x_range and y_range. If x_sampling or y_sampling are supplied | |||
the operation will ensure that a bin is no smaller than the minimum | |||
sampling distance by reducing the width and height when the zoomed | |||
in beyond the minimum sampling distance. | |||
beyond the minimum sampling distance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when zoomed in beyond
Ready to merge? Looks good to me. |
I'll be able to review this PR shortly. Don't merge till then! |
holoviews/operation/datashader.py
Outdated
|
||
aggregator = param.ClassSelector(class_=ds.reductions.Reduction, | ||
default=ds.count()) | ||
class resample_operation(Operation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it called resample_operation
and not just resample
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an abstract base class or a usable one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract, doesn't have a _process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My issue here is that the name does not sound like an abstract class. I think it can have a simple name and should simply be hidden from users when importing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResampleOperation?
if self.p.x_sampling: | ||
width = int(min([(xspan/self.p.x_sampling), width])) | ||
if self.p.y_sampling: | ||
height = int(min([(yspan/self.p.y_sampling), height])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to be careful here - I don't see why xspan
and self.p.x_sampling
can't both be integers on Python 2 causing potential integer division issues. Same when computing height
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will add the __future__
import.
# Disable upsampling if requested | ||
(xstart, xend), (ystart, yend) = (x_range, y_range) | ||
xspan, yspan = (xend-xstart), (yend-ystart) | ||
if not self.p.upsample and self.p.target is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there at least be a warning to say upsampling has been disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh why? It's a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the parameter should be respected as set, with no warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the docstring needs improvement:
"Whether to allow upsampling if the source array is smaller than the requested array."
What is the behavior if upsample=False
? Padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Philipp's now explained it to me. The behavior makes sense though I think there could be a bit more clarification in the docstrings.
Looks good other than a few minor comments I made. Happy to see this PR merged once those things are addressed. |
holoviews/operation/datashader.py
Outdated
just as large as it needs to be to contain the data, which will | ||
be faster and use less memory if the resulting aggregate is | ||
being overlaid on a much larger background.""") | ||
|
||
height = param.Integer(default=400, doc=""" | ||
The height of the aggregated image in pixels.""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this height
parameter used by regrid
? I'm not sure it is clear to talk about an 'aggregated' image in that case. Maybe just talk about the 'output' image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, although in aggregation is still the correct term at least when downsampling. Will fix it anyway.
Looks good and the unit tests have passed. Merging. |
So GitHub makes it look like this is in |
Today :-) |
Well isn't it my lucky day. 😄 |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
As suggested and outlined in #1552 this PR adds a
regrid
operation that allows dynamically downsampling and upsampling HoloViews Image, RGB and HSV types to a specified x_range, y_range, width and height. It closely mirrors theaggregate
operation but operates on Images instead. I introduced a common baseclass for the two operations and have started using it in various projects. It will require some further cleanup, decisions on naming and documentation but is now fully functional.