Add quadmesh glyph with rectilinear and curvilinear support #779

jonmmease · 2019-08-13T15:48:34Z

This PR is an alternative implementation of quadmesh rasterization, initially based on the logic from #769.

Unlike that PR, this PR adds quadmesh glyph classes, and supports the standard datashader aggregation framework.

Overall, the architecture fits well with that of the dataframe-based glyphs (points, line, and area). And relying on the datashader aggregation framework results in a lot less code duplication compared to #769.

Thanks to the variable argument expansion from #780, the performance for rectilinear rendering is now on par with the prototype implementation.

For curvilinear quadmesh aggregation, this PR uses a raycasting algorithm for determining which pixels to fill. I've found the performance of this approach to be ~1.5x slower than the prototype implementation which uses an area-based point inclusion approach. This algorithm isn't the only difference between the implementation, and I didn't exhaustively chase down the differences this time.

I went with the raycasting algorithm because it handles concave and complex quads. It is also very straightforward to extend this algorithm to general polygons (with or without holes), so I think there's a good path here towards adding general polygon support to datashader as well.

For example usage and benchmarks, see https://anaconda.org/jonmmease/quadmeshcomparisons_pr/notebook (rendered at https://nbviewer.jupyter.org/urls/notebooks.anaconda.org/jonmmease/quadmeshcomparisons_pr/download)

Future work:
This PR does not include any parallelization support, so extending this to work in a multi-threaded or distributed context is left as future work.

@jbednar @philippjfr

Outdated performance observations from initial PR:

But, it's an order of magnitude slower than the implementations in #769. Here is a notebook showing some timing results: https://anaconda.org/jonmmease/rectquadmesh_examples/notebook.

Roughly speaking, this PR is ~13x faster than representing a rectilinear quadmesh with a trimesh. But the specialized implementation from #769 is ~13 faster than this PR. Note that I disabled numba parallelization for these tests for consistency

I did some performance debugging and I found that nearly all of the extra overhead in this PR, compared to the specialized implementation, is coming from the use of the aggregation framework. If in the _extend function, I don't call append but instead implement a single aggregation then the performance is comparable to the specialized implementations.

So the bad news is that right now we need to choose between performance and consistency/maintainability for the quadmesh implementation. The good news is that there may be an order of magnitude speedup to be had across points, line, and area glyphs as well if we can work out how to optimize the aggregation framework.

jbednar · 2019-08-13T15:55:49Z

That's both exciting and alarming! Can you expand on "disabled numba parallelization for these tests for consistency"? Given that the final implementation will use Numba, are comparisons without Numba meaningful here?

jonmmease · 2019-08-13T15:59:18Z

Can you expand on "disabled numba parallelization for these tests for consistency"?

@philippjfr's implementation uses the numba parallel prange loops (https://numba.pydata.org/numba-doc/latest/user/parallel.html#explicit-parallel-loops), but the current datashader glyphs do not. I tried turning it on for this PR, but got a numba error that I didn't spend time diagnosing.

philippjfr · 2019-08-13T16:46:24Z

I think unifying the dispatch/aggregation pipeline and internal APIs is really important and I'm really excited about this but going from a 600-700x to a 13x speedup would indeed be pretty sad :(

jbednar · 2019-08-13T16:50:35Z

Well, it wouldn't be that drastic; presumably the Numba issue can be worked around, which would leave it off by a single factor of 13, right?

datashader/tests/test_quadmesh.py

jonmmease · 2019-08-13T16:52:40Z

I was only seeing a factor of 2-3x speedup with parallelization in @philippjfr 's code (on a machine with many more cores than that). But, yeah, we should figure out where we can use this for the glyph generation code.

philippjfr · 2019-08-13T16:53:34Z

I'm very confused why there is such a huge performance difference. Numba should automatically inline the append function. I'll play around with it and maybe ask Val about it when he returns next week.

jonmmease · 2019-08-13T16:55:25Z

Yeah, I think it's largely about inlining. And I don't have intuition of when things get inlined. The numba docs say that numba itself doesn't do it (https://numba.pydata.org/numba-doc/dev/user/faq.html#does-numba-inline-functions), but that LLVM does in some cases.

jonmmease · 2019-08-13T16:57:50Z

Also, the append function is constructed from evaluating a string, so perhaps that's a factor.

philippjfr · 2019-08-13T17:01:05Z

Also, the append function is constructed from evaluating a string, so perhaps that's a factor.

I wouldn't have thought so since once it's JIT compiled there should be no difference.

philippjfr · 2019-08-13T17:03:44Z

As an aside, I can't get the implementation in this PR to work with an x_range or y_range provided to the Canvas.

jonmmease · 2019-08-13T17:05:50Z

Hmm, should be tested by

https://github.com/pyviz/datashader/blob/711c40d8846d40bc8241f11a6977cd1b0f75ee81/datashader/tests/test_quadmesh.py#L67-L96

philippjfr · 2019-08-13T17:08:55Z

Yeah, nevermind that's not the actual issue. The problem seems to be the order the dimensions are parsed in and returned as. I'd expect the DataArray dimensions to be ordered y, x on input and output, something appears to be inverting that.

philippjfr · 2019-08-13T17:21:53Z

Don't really understand why this happens because this code is correct:

if x is None and y is None:
    y, x = source.dims

but in this example things are really weird:

xs = np.logspace(0, 3, 50)
ys = np.arange(50)

zs = xs * ys[:, np.newaxis]
ds = hv.QuadMesh((xs, ys, zs), datatype=['xarray']).data
da = da.z

canvas = Canvas(x_range=(200, 1000), y_range=(0, 50))

print(da.dims)

agg1 = canvas.quadmesh(da, agg=ds.reductions.mean('z'))

print(agg1.dims)
print(agg1.coords)

agg2 = canvas.quadmesh(da, 'x', 'y', agg=ds.reductions.mean('z'))

print(agg2.dims)
print(agg2.coords)

('y', 'x')
('x', 'y')
Coordinates:
  * x        (x) float64 0.04167 0.125 0.2083 0.2917 ... 49.71 49.79 49.88 49.96
  * y        (y) float64 200.7 202.0 203.3 204.7 ... 995.3 996.7 998.0 999.3
('y', 'x')
Coordinates:
  * y        (y) float64 0.04167 0.125 0.2083 0.2917 ... 49.71 49.79 49.88 49.96
  * x        (x) float64 200.7 202.0 203.3 204.7 ... 995.3 996.7 998.0 999.3

philippjfr · 2019-08-13T17:29:19Z

To summarize the issues:

On the input the code inferring the dimension order appears wrong
On the output the coord order matches the dim ordering, which is usually not desirable. The coordinate order is meaningful by NetCDF convention and should in the 2D case usually either be the inverse of the dim ordering OR maybe more appropriately simply inherit the ordering of the input DataArray.

philippjfr · 2019-08-13T17:48:56Z

Back to the inlining issue. I'm guessing the problem is that append is always two-levels deep, i.e. it will itself call another function to do the various base aggregations. I think we may be able to get it to inline properly if, instead of creating one append function, we pass in separate append function for each base aggregation, i.e. for a mean aggregation the inner loop would look something like:

for xi in range(x0i, x1i):
    for yi in range(y0i, y1i):
        append_count(j, i, xi, yi, count_agg)
        append_sum(j, i, xi, yi, sum_agg)

The actual implementation would be something like:

for xi in range(x0i, x1i):
    for yi in range(y0i, y1i):
        for append, array in zip(aggs, arrays):
            append(j, i, xi, yi, array)

That would also get rid of the need for dynamically creating the append function from a string ( 😱) and instead directly use the _append from the base reductions (count, sum, min, max, m2, any) directly.

I'm pretty hopeful that will lead to proper inlining since it's pretty close to what my implementation was doing anyway.

Edit: I guess there are some complications in that we would also have to factor out the code that extracts the aggregated value but I still think it's doable.

jonmmease · 2019-08-13T18:38:49Z

The dimension flip seems to happen when converting the DataArray to a Dataset...

print(da.dims)
print(list(da.to_dataset().dims))

['y', 'x']
['x', 'y']

So I'll just be sure to grab it from the DataArray itself.

jonmmease · 2019-08-13T18:49:41Z

The dimension and coord ordering logic is being carried over from the default behavior of the glyph aggregators. Here's what you get from points

canvas = Canvas(x_range=(200, 1000), y_range=(0, 50))
canvas.points(pd.DataFrame({'x': [300], 'y': [20]}), 'x', 'y')

<xarray.DataArray (y: 600, x: 600)>
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int32)
Coordinates:
  * y        (y) float64 0.04167 0.125 0.2083 0.2917 ... 49.71 49.79 49.88 49.96
  * x        (x) float64 200.7 202.0 203.3 204.7 ... 995.3 996.7 998.0 999.3

Do you think we should flip the coord order for what Datashader returns across all of the glyph types? Or we could treat quadmesh individually because it's the only one (so far) where the input data structure is an xarray.

philippjfr · 2019-08-13T18:49:41Z

Actually the other issue with the current append is that it does the array lookup inside the inner loop which also seems pretty inefficient.

philippjfr · 2019-08-13T18:50:42Z

Or we could treat quadmesh individually because it's the only one (so far) where the input data structure is an xarray.

Yeah, I think we should have special logic for xarrays.

jonmmease · 2019-08-13T18:51:23Z

Actually the other issue with the current append is that it does the array lookup inside the inner loop which also seems pretty inefficient.

I'm going to manually inline the mean aggregate case so we can see what it looks like and what the performance difference is. I'll paste the inlined version here when I have it.

jbednar · 2019-08-13T18:55:57Z

Isn't the raster input an xarray too? But that's a different code path right now, I suppose.

jonmmease · 2019-08-14T21:24:19Z

I think I've worked out an approach to automating this variable-arity optimization that works across glyphs. I'll open a separate PR with that when it's ready...

The ordering of x then y makes it possible to pass the output xarray into a HoloViews Image container with the dimensions being transposed

jonmmease · 2019-08-15T21:29:18Z

For now, I updated the construction of the returned DataArray to specify a coordinate order of [x, y]. This ordering was the reason that passing an aggregate result into a HoloViews Image resulted in an inversion of the x and y axes (which was true for points and lines as well). Done in commit 87cf7e3.

jonmmease · 2019-08-15T22:27:09Z

@philippjfr I think I'm satisfied (in terms of correctness, API, and performance) with the rectilinear quadmesh implementation in this PR now. So feel free to kick the tires again and let me know if anything else sticks out.

I'm going to move on to the curvilinear case now.

jbednar · 2019-08-16T15:35:23Z

So, what's the final word on the performance?

jonmmease · 2019-08-16T18:42:30Z

So, what's the final word on the performance?

Equivalent to the "append_expanded" line above (~40x speedup over trimesh for rectilinear case). Turns out that all of the *_parallel implementations suffer from a race condition that would require something like numba/numba#3681 to address. So prange parallelization is not enabled at this point.

jonmmease · 2019-08-16T23:01:38Z

@jbednar @philippjfr I've added the curvilinear quadmesh support and updated the PR description to reflect the current contents of the PR. Ready for review!

datashader/glyphs/quadmesh.py

jbednar · 2019-08-28T17:46:34Z

Merging now; hopefully we can recover the last few 12X or so of performance using Dask to make the parallel operations safe, plus some other optimizations later.

jonmmease added 2 commits August 13, 2019 11:28

Add rectilinear quadmesh glyph and test

0d7d6ac

flake

eecca81

jonmmease added 3 commits August 13, 2019 12:10

py2 absolute_import

43ff657

py2 absolute_import

0334bf4

py2 absolute_import

711c40d

philippjfr reviewed Aug 13, 2019

View reviewed changes

datashader/tests/test_quadmesh.py Outdated Show resolved Hide resolved

get default dimensions from DataArray

c0d1dd4

jonmmease added 2 commits August 13, 2019 20:27

Pull masking and int conversion outside of _extend function

6e8dcb4

Add inline mean comment

2afa24e

jonmmease mentioned this pull request Aug 15, 2019

Glyph rendering optimization using variable length argument expansion #780

Merged

jonmmease added 3 commits August 15, 2019 14:41

Fix clipping error

f4c6843

Specify explicit coordinates ordering

87cf7e3

The ordering of x then y makes it possible to pass the output xarray into a HoloViews Image container with the dimensions being transposed

Fix count_cat finalize method now that coords is an OrderedDict

223f4d0

jonmmease added 4 commits August 15, 2019 17:36

Fix dask construction of DataArray to use OrderedDict for coords

a0c1c1b

Remove print

4425e52

Merge branch 'master' into quadmesh_glyph

e260cea

Use expand_aggs_and_cols decorator for quadmesh rendering

eaa410f

Add initial Curvilinear implementation

0523dd2

jonmmease added 2 commits August 16, 2019 12:18

Switch to raycasting inclusion test

2d1d1c0

Speedup by initializing numpy arrays outside main loop

0095ab5

jonmmease added 4 commits August 16, 2019 15:25

Don't skip rendering last row/col in canvas

51029f8

Transpose Dataset dimensions to match coordinate dimensions

d5005b2

Fix subpixel rendering to always display single pixel quads

fa57e48

Curvilinear Tests

5a3372a

jonmmease changed the title ~~[WIP] Add quadmesh glyph~~ Add quadmesh glyph with rectilinear and curvilinear support Aug 16, 2019

Remove unused quadmesh utility functions

33faae1

philippjfr reviewed Aug 16, 2019

View reviewed changes

datashader/glyphs/quadmesh.py Outdated Show resolved Hide resolved

jonmmease mentioned this pull request Aug 21, 2019

Add polygon support #181

Closed

jbednar merged commit e42694b into master Aug 28, 2019

This was referenced Sep 30, 2019

ufunc 'over' not supported for the input types in 7_Networks.ipynb #792

Closed

Implemented QuadMesh resampling #769

Closed

maximlt deleted the quadmesh_glyph branch December 25, 2021 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quadmesh glyph with rectilinear and curvilinear support #779

Add quadmesh glyph with rectilinear and curvilinear support #779

jonmmease commented Aug 13, 2019 •

edited by jbednar

Loading

jbednar commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jbednar commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jonmmease commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019 •

edited

Loading

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019 •

edited

Loading

jonmmease commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jonmmease commented Aug 13, 2019

jbednar commented Aug 13, 2019

jonmmease commented Aug 14, 2019

jonmmease commented Aug 15, 2019

jonmmease commented Aug 15, 2019

jbednar commented Aug 16, 2019

jonmmease commented Aug 16, 2019

jonmmease commented Aug 16, 2019

jbednar commented Aug 28, 2019

Add quadmesh glyph with rectilinear and curvilinear support #779

Add quadmesh glyph with rectilinear and curvilinear support #779

Conversation

jonmmease commented Aug 13, 2019 • edited by jbednar Loading

jbednar commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jbednar commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jonmmease commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019 • edited Loading

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019 • edited Loading

jonmmease commented Aug 13, 2019

jonmmease commented Aug 13, 2019

philippjfr commented Aug 13, 2019

philippjfr commented Aug 13, 2019

jonmmease commented Aug 13, 2019

jbednar commented Aug 13, 2019

jonmmease commented Aug 14, 2019

jonmmease commented Aug 15, 2019

jonmmease commented Aug 15, 2019

jbednar commented Aug 16, 2019

jonmmease commented Aug 16, 2019

jonmmease commented Aug 16, 2019

jbednar commented Aug 28, 2019

jonmmease commented Aug 13, 2019 •

edited by jbednar

Loading

philippjfr commented Aug 13, 2019 •

edited

Loading

philippjfr commented Aug 13, 2019 •

edited

Loading