-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quadmesh glyph with rectilinear and curvilinear support #779
Conversation
That's both exciting and alarming! Can you expand on "disabled numba parallelization for these tests for consistency"? Given that the final implementation will use Numba, are comparisons without Numba meaningful here? |
@philippjfr's implementation uses the numba parallel |
I think unifying the dispatch/aggregation pipeline and internal APIs is really important and I'm really excited about this but going from a 600-700x to a 13x speedup would indeed be pretty sad :( |
Well, it wouldn't be that drastic; presumably the Numba issue can be worked around, which would leave it off by a single factor of 13, right? |
I was only seeing a factor of 2-3x speedup with parallelization in @philippjfr 's code (on a machine with many more cores than that). But, yeah, we should figure out where we can use this for the glyph generation code. |
I'm very confused why there is such a huge performance difference. Numba should automatically inline the append function. I'll play around with it and maybe ask Val about it when he returns next week. |
Yeah, I think it's largely about inlining. And I don't have intuition of when things get inlined. The numba docs say that numba itself doesn't do it (https://numba.pydata.org/numba-doc/dev/user/faq.html#does-numba-inline-functions), but that LLVM does in some cases. |
Also, the |
I wouldn't have thought so since once it's JIT compiled there should be no difference. |
As an aside, I can't get the implementation in this PR to work with an x_range or y_range provided to the Canvas. |
Yeah, nevermind that's not the actual issue. The problem seems to be the order the dimensions are parsed in and returned as. I'd expect the DataArray dimensions to be ordered |
Don't really understand why this happens because this code is correct:
but in this example things are really weird: xs = np.logspace(0, 3, 50)
ys = np.arange(50)
zs = xs * ys[:, np.newaxis]
ds = hv.QuadMesh((xs, ys, zs), datatype=['xarray']).data
da = da.z
canvas = Canvas(x_range=(200, 1000), y_range=(0, 50))
print(da.dims)
agg1 = canvas.quadmesh(da, agg=ds.reductions.mean('z'))
print(agg1.dims)
print(agg1.coords)
agg2 = canvas.quadmesh(da, 'x', 'y', agg=ds.reductions.mean('z'))
print(agg2.dims)
print(agg2.coords)
|
To summarize the issues:
|
Back to the inlining issue. I'm guessing the problem is that for xi in range(x0i, x1i):
for yi in range(y0i, y1i):
append_count(j, i, xi, yi, count_agg)
append_sum(j, i, xi, yi, sum_agg) The actual implementation would be something like: for xi in range(x0i, x1i):
for yi in range(y0i, y1i):
for append, array in zip(aggs, arrays):
append(j, i, xi, yi, array) That would also get rid of the need for dynamically creating the I'm pretty hopeful that will lead to proper inlining since it's pretty close to what my implementation was doing anyway. Edit: I guess there are some complications in that we would also have to factor out the code that extracts the aggregated value but I still think it's doable. |
The dimension flip seems to happen when converting the print(da.dims)
print(list(da.to_dataset().dims))
So I'll just be sure to grab it from the |
The dimension and coord ordering logic is being carried over from the default behavior of the glyph aggregators. Here's what you get from canvas = Canvas(x_range=(200, 1000), y_range=(0, 50))
canvas.points(pd.DataFrame({'x': [300], 'y': [20]}), 'x', 'y')
Do you think we should flip the coord order for what Datashader returns across all of the glyph types? Or we could treat quadmesh individually because it's the only one (so far) where the input data structure is an xarray. |
Actually the other issue with the current |
Yeah, I think we should have special logic for xarrays. |
I'm going to manually inline the |
Isn't the raster input an xarray too? But that's a different code path right now, I suppose. |
I think I've worked out an approach to automating this variable-arity optimization that works across glyphs. I'll open a separate PR with that when it's ready... |
The ordering of x then y makes it possible to pass the output xarray into a HoloViews Image container with the dimensions being transposed
For now, I updated the construction of the returned |
@philippjfr I think I'm satisfied (in terms of correctness, API, and performance) with the rectilinear quadmesh implementation in this PR now. So feel free to kick the tires again and let me know if anything else sticks out. I'm going to move on to the curvilinear case now. |
So, what's the final word on the performance? |
Equivalent to the "append_expanded" line above (~40x speedup over trimesh for rectilinear case). Turns out that all of the |
@jbednar @philippjfr I've added the curvilinear quadmesh support and updated the PR description to reflect the current contents of the PR. Ready for review! |
Merging now; hopefully we can recover the last few 12X or so of performance using Dask to make the parallel operations safe, plus some other optimizations later. |
This PR is an alternative implementation of quadmesh rasterization, initially based on the logic from #769.
Unlike that PR, this PR adds quadmesh glyph classes, and supports the standard datashader aggregation framework.
Overall, the architecture fits well with that of the dataframe-based glyphs (points, line, and area). And relying on the datashader aggregation framework results in a lot less code duplication compared to #769.
Thanks to the variable argument expansion from #780, the performance for rectilinear rendering is now on par with the prototype implementation.
For curvilinear quadmesh aggregation, this PR uses a raycasting algorithm for determining which pixels to fill. I've found the performance of this approach to be ~1.5x slower than the prototype implementation which uses an area-based point inclusion approach. This algorithm isn't the only difference between the implementation, and I didn't exhaustively chase down the differences this time.
I went with the raycasting algorithm because it handles concave and complex quads. It is also very straightforward to extend this algorithm to general polygons (with or without holes), so I think there's a good path here towards adding general polygon support to datashader as well.
For example usage and benchmarks, see https://anaconda.org/jonmmease/quadmeshcomparisons_pr/notebook (rendered at https://nbviewer.jupyter.org/urls/notebooks.anaconda.org/jonmmease/quadmeshcomparisons_pr/download)
Future work:
This PR does not include any parallelization support, so extending this to work in a multi-threaded or distributed context is left as future work.
@jbednar @philippjfr
Outdated performance observations from initial PR:
But, it's an order of magnitude slower than the implementations in #769. Here is a notebook showing some timing results: https://anaconda.org/jonmmease/rectquadmesh_examples/notebook.
Roughly speaking, this PR is ~13x faster than representing a rectilinear quadmesh with a trimesh. But the specialized implementation from #769 is ~13 faster than this PR. Note that I disabled numba parallelization for these tests for consistency
I did some performance debugging and I found that nearly all of the extra overhead in this PR, compared to the specialized implementation, is coming from the use of the aggregation framework. If in the
_extend
function, I don't callappend
but instead implement a single aggregation then the performance is comparable to the specialized implementations.So the bad news is that right now we need to choose between performance and consistency/maintainability for the quadmesh implementation. The good news is that there may be an order of magnitude speedup to be had across
points
,line
, andarea
glyphs as well if we can work out how to optimize the aggregation framework.