-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle log colormapping change in bokeh 2.0 #4376
Comments
My opinion is that we should warn if the log color mapper encounters a zero value and tell the user to set a clim lower bound. Another option is that if we encounter an integer array we can automatically set the lower bound of the log range to 1 but warn if a float is used since it's not obvious how to choose a default lower bound.
We absolutely should do this as it's needed for masking integer arrays in linked selections. Not sure how quickly we can do this however.
I'm -1 on both 2 and 4 tbh. There was a reason that log1p was disabled in bokeh because it was effectively wrong in many cases. I'm also not too happy casting ints to floats really. |
This is a pretty good suggestion, especially if the warning could point to the docs to help avoid user confusion. We would still have to update all our notebooks though...
I think this suggestion is even better as it would automatically handle the common case automatically without having to update existing notebooks. We would just have to document what we are doing in the integer case and issue a warning for the float case. |
Okay, I'll work up a PR to do the int lower bound handling and float warning. |
I think what Philipp is proposing is a better version of option 2, i.e. to set up HoloViews options to avoid both the current urgent broken-colormap problem but also the original broken colorbar-tick issue and misleading colorbar issue, all at once. So it sounds ideal, if it works. |
just one thought: I thought the integer NaNs exist now? would that help? |
apparently only experimentally in pandas, still not in numpy: https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html |
Pandas uses a masked array approach, which we could optionally return from Datashader as well, but it would take a good bit of plumbing to connect things up between Datashader, HoloViews, and Bokeh. |
Datashader outputs an xarray, not pandas DataFrames so that won't work anyway. |
I didn't mean that Datashader would return a DataFrame, but that it would (optionally) return a masked array. |
But also note that even if we did return a masked array from Datashader for integer aggregates, I'm not at all certain that I'd want the output of a count aggregation to use masking. A count of zero isn't missing data; it's simply a count of zero. Yet I still want to be able to use a log colormapper with counts, such as for the Datashader census example, which works perfectly well using log1p for the colormapping in Datashader's shade function, but has always been messed up in bokeh (as colormapping used log1p but colorbar ticks used log). So I think the clim approach proposed by Philipp is the right answer for counts independently of whether we also support masked integer arrays, which would be used for non-count aggregations (e.g. for sum, masking out pixels where it would be summing zero items). |
+1 on this. I would also vote for throwing a warning in case the clim gets cut off at 1 for integers. After all, holoviews wouldn't know the array came out of a counting operation or if it simply happens to be integer valued, in which case the user should be warned 0s were omitted. I don't know how easy it would be to control the verbosity of such warnings, but I assume if you are aware there are zeros but you still want to log transform it isn't too much asked of the user to find a specific way of handling this before passing it into holoviews. |
Are you sure this is really needed? What is the alternative here? A zero value can never be displayed in the log color mapper so a lower bound of 1 is the only possible default lower bound for ints. |
I'd agree that the logic shouldn't be so contrived that it depends on the source of data. It should be as simple as @philippjfr indicates, however, a source of floats would need more user input I guess? Would the palette start at 0.1? Or 0.01? User still needs to decide that. |
No of course, no warning is ever strictly needed so long as the user can be confident that what they're passing in is what they think it is (not always easy/practical in a pipeline) and they are fully aware of holoviews' behaviour. But it is still an ad-hoc choice, even if not completely arbitrary, on the part of holoviews. In the end, the reason for setting the lower clim to 1 is mostly that it is easy to do (easier than piecing together a color scale that goes as the logarithm for values >=1 but also has a category for zeros, anyway), not because it is correct (because as James said, NaN is not the same as 0, not even in count data). Or put differently, if your data contain zeros, you should not log-transform them in the first place except maybe for quick exploration. Anyway, just my two cents; we've had this discussion before. |
@poplarShift I do see what you're getting at and it's a point well taken. I'd say that in future we should support masked arrays properly and at that point we could expand the warning to ints as well asking the user to mask out the zero values explicitly.
Right, floats definitely need a very deliberate choice by the user and will warn. |
Sounds good! |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Example of the issue as reported by K.-Michael Aye on gitter:
Bokeh 2.x no longer lets the log colormapper handle zeros. I understand the goal of this change was to address this issue bokeh/bokeh#8061 (also see the discussion here bokeh/bokeh#8724). This problem can be more easily avoided when using floats as you can use NaN to avoid the log zero issues.
Unfortunately, this fix will cause a lot of confusion as illustrated by the screenshot above. After discussing this with @jbednar, we listed the following ways we could address this:
Document this issue and update how to use
clim
appropriately, checking and updating all our notebooks and handling all the confusion caused to users who miss the updated docs.Maintain backwards compatibility somehow in bokeh 2.0, stating that our
logz
semantics match Bokeh > 2.0 and offer a new option for the better behavior (the old behavior resulted in incorrect colorbars). I'm not sure how this could be achieved technically if the old behavior is not available in bokeh 2.x anymore...Somehow support a masking approach at the element level (i.e
Image
at least, and that could be generalized to other elements?). Seems like a whole lot of work...Address this for the most common use case which will be for any datashading operation that returns zero integers (e.g
count
). To do this you have to avoid generating integers in the first place as NaN isn't a valid integer: the idea would be to cast integer arrays to float and replace all zeros with NaNs.If it isn't already obvious, I am in favor of option 4 (and maybe option 2, if it is possible) as it would fix the problem in the majority of cases without having to update all the notebooks and inform all affected users. Although I don't think this is ideal (I would prefer to keep count dtypes as integer) this transformation doesn't violate datashader semantics where NaN and zero are equivalent. Maybe we could have an option for the operations (default off) to skip this transformation for people who want to preserve integer types?
Unfortunately, I don't see any easy solutions and nothing that doesn't involve documenting this change in behavior in Bokeh 2.0 somewhere. What are your thoughts @philippjfr ?
The text was updated successfully, but these errors were encountered: