Add polygon support #181

brendancol · 2016-06-22T18:53:39Z

Add support for Canvas.polygon. Something like:

endangered_agg = cvs.polygon(df, 'my_polygon_field', how='sum', field='endangered_species_list_score')

jonmmease · 2019-08-21T18:23:47Z

I think I have a pretty clear understanding of what this might look like now that I've finished the following PRs:

RaggedArray pandas extensions array (Add pandas ExtensionArray for storing homogeneous ragged arrays #687)
Support for row-oriented coordinates in Canvas.line (Support for row-oriented coordinates in Canvas.line #694)
Add quadmesh glyph with rectilinear and curvilinear support (Add quadmesh glyph with rectilinear and curvilinear support #779)

Here's what I'm thinking...

Polygon representation

I think the easiest thing to do here is to represent a polygon set as a pair of arrays, one for each the x and y coordinate. nan values are then used to indicate the end of one polygon and the start of the next.

Polygon fill rendering

In the quadmesh PR (#779) I implemented filling using the "crossing number" point-in-polygon test (http://geomalgorithms.com/a03-_inclusion.html). If we use this same algorithm for these sets of polygons then we naturally get the following behavior.

The first polygon is filled in if it's the only one
The second (and third, forth, ...) polygon acts as a hole if it is enclosed in the first polygon, or it is filled in as a separate polygon if it is outside of the first polygon.

Polygon API

I propose that the Canvas.polygon method have the same API as Canvas.line. We don't need to cover the full API right away, but in principle every combination of the axis and column arguments supported by Canvas.line could also be supported by Canvas.poly. Here are the two API uses that I expect to be most common:

Variable sized polygons, one per row. In this case, the x and y coordinates would be stored as RaggedArray columns in the DataFrame.

canvas.polygon(df, 'x', 'y', axis=1)

A collection of fixed-size polygons. This would be useful for displaying a large collection of polygons that all have the same number of vertices. In this case, the data frame would have a column for each x and y coordinate. For example, rendering a collection of pentagons would look something like this

canvas.polygon(df, ['x0', 'x1', 'x2', 'x3', 'x4'], ['y0', 'y1', 'y2', 'y3', 'y4'], axis=1)

Polygon outlines

A nice consequence of using this polygon representation and this API is that we get polygon outline rendering for free. Just change canvas.polygon to canvas.line. The line method already uses nan values to separate lines, and as long as the polygons are closed, the line method would automatically be able to draw the polygon outlines.

jbrockmendel · 2019-10-12T02:35:54Z

Are the possibly-nested polygons you're describing isomorphic to shapely objects? I've been talking with @jbednar about a data structure that looks a lot like what you describe here

The second (and third, forth, ...) polygon acts as a hole if it is enclosed in the first polygon, or it is filled in as a separate polygon if it is outside of the first polygon.

but I've been trying to make this representation round-trip with shapely objects. Is that not the goal @jbednar?

jbednar · 2019-10-12T03:56:05Z

I believe shapely can represent any shape, so yes, supporting round tripping is a good idea...

jbrockmendel · 2019-10-12T20:45:10Z

implemented filling using the "crossing number" point-in-polygon test

@jonmmease how costly is this check? I was under the (incorrect @jbednar?) impression that the data structure was supposed to reflect when one polygon is "cut out" from another. If this doing this check at runtime is not too costly, the data representation I have in mind can be made much simpler.

jonmmease · 2019-10-13T10:28:06Z

Are the possibly-nested polygons you're describing isomorphic to shapely objects?

@jbrockmendel I haven't dug into how shapely represents Polygons, so I can't speak to the isomorphism at this point. But I definitely agree that this would be desirable.

For the curvilinear quadmesh implementation, I implemented quad filling using the crossing number method (http://geomalgorithms.com/a03-_inclusion.html). This algorithm processes the edges of one (or more) polygon(s) independently. It has the natural property that any polygon fully contained in another polygon acts as a hole. And a double nested polygon would act as a filled "island" inside the hole.

As described in http://geomalgorithms.com/a03-_inclusion.html, the crossing-number method breaks down in the case of self-intersecting polygons. In this case, the winding-number algorithm is a better choice.

As far as I can see, if the polygons don't self intersect and the polygons don't overlap with each other then the crossing-number and winding-number algorithms will produce identical filling solutions. But just to be safe, it's probably better for Datashader to plan on using the winding number method for polygons. To support this, we would want to have an encoding of whether each polygon is a "fill" or "hole" polygon. In terms of the winding-number algorithm, this would be equivalent to whether the polygon is clockwise or counterclockwise.

I think this could be accomplished with a small modification to the encoding I suggested above. Instead of separating polygons with nan values, we could separate them with +inf/-inf values where the sign would indicate whether the following polygon is a "fill" or "hole" polygon (and you might have already suggested this elsewhere, but I don't remember for sure). This would assume that the first polygon is always a "fill" polygon, so if this isn't a safe assumption then we need to make another adjustment.

Is that helpful? Happy to discuss further!

jbednar · 2019-10-13T22:18:16Z

See poc.ShapelyArray in https://github.com/jbrockmendel/geoarr for discussion.

@jonmmease , I imagine we could relax the isnan check in canvas.line to treat +Inf and -Inf as equivalent to nan, so it seems like either way we can support switching between .line and .polygon. Can you speculate on the performance implications of the crossing-number and winding-number options, for datashader rendering and for other relevant algorithms? If it's dramatic, disallowing self-intersecting polygons seems ok, as that could presumably be enforced as a pre-processing step when needed. If the performance is about the same, then the safer option sounds good...

@brendancol, any comments?

jonmmease · 2019-10-14T13:20:27Z

Yes, I agree that making sure that canvas.line treats +inf/-inf like nan is a fine solution (And it may already do that).

Can you speculate on the performance implications of the crossing-number and winding-number options.

For each row, it will amount to allocating one additional boolean array with length matching the number of edges, populating this array during the first traversal, and then using the element values to flip the sign of the contribution of each edge. My intuition is that the impact would be fairly negligible, maybe a couple of percentage points slower at most. So, I'd be in favor of going with the winding-number approach from the start.

jonmmease · 2019-10-17T11:57:26Z

@jbrockmendel, one issue is that, from https://github.com/jbrockmendel/geoarr/blob/master/poc.py, it looks like shapely and your ShapelyArray store the x and y coordinates of polygons in a single interleaved array. For consistency with the rest of datashader, it would be preferable to have the x and y coordinates separated into two arrays.

Also, after looking carefully at the winding-order algorithm I'm pretty sure that we don't actually need to distinguish outer polygons from hole polygons if the polygons follow the convention that outer polygons are specified in ccw order and hole polygons are specified in cw order. And it looks like a shapely polygon can be transformed into this form using the polygon.orient function. We may still want to use the +/- inf separators for convenience when round tripping, but the fill algorithm would be simplest if we could assume this ccw/cw ordering convention.

jbednar · 2019-10-17T14:38:26Z

I'm happy with that; it's a new storage format, and so we can decide on the conventions, including ordering. Separating the x and y coordinates makes sense at a datashader level (and for how Pandas and Xarray usually work), so it sounds reasonable to me, but there may be some implication I can't see.

I think at this point @jonmmease should just take whatever is useful from https://github.com/jbrockmendel/geoarr, extend Datashader's RaggedArray support to allow polygons, and implement rendering. Once it's implemented, we can then evaluate how the data structure works for other operations, and see if there are any changes other people (e.g. @brendancol) suggest.

jbednar · 2019-12-11T01:57:04Z

Now in master.

brendancol added the enhancement label Jun 22, 2016

jbednar added the wishlist label Jun 24, 2016

jbednar assigned brendancol and jcrist Sep 8, 2016

jbednar added ready and removed wishlist labels Sep 8, 2016

jbednar unassigned brendancol and jcrist Apr 11, 2017

jonmmease self-assigned this Jan 4, 2019

jonmmease mentioned this issue Jan 13, 2019

Add pandas ExtensionArray for storing homogeneous ragged arrays #687

Merged

jonmmease mentioned this issue Oct 1, 2019

Initial GPU support #793

Merged

jonmmease mentioned this issue Oct 23, 2019

Add Polygon support #819

Closed

jbednar closed this as completed Dec 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add polygon support #181

Add polygon support #181

brendancol commented Jun 22, 2016

jonmmease commented Aug 21, 2019 •

edited by jbednar

Loading

jbrockmendel commented Oct 12, 2019

jbednar commented Oct 12, 2019

jbrockmendel commented Oct 12, 2019

jonmmease commented Oct 13, 2019

jbednar commented Oct 13, 2019

jonmmease commented Oct 14, 2019

jonmmease commented Oct 17, 2019 •

edited by jbednar

Loading

jbednar commented Oct 17, 2019

jbednar commented Dec 11, 2019

Add polygon support #181

Add polygon support #181

Comments

brendancol commented Jun 22, 2016

jonmmease commented Aug 21, 2019 • edited by jbednar Loading

Polygon representation

Polygon fill rendering

Polygon API

Polygon outlines

jbrockmendel commented Oct 12, 2019

jbednar commented Oct 12, 2019

jbrockmendel commented Oct 12, 2019

jonmmease commented Oct 13, 2019

jbednar commented Oct 13, 2019

jonmmease commented Oct 14, 2019

jonmmease commented Oct 17, 2019 • edited by jbednar Loading

jbednar commented Oct 17, 2019

jbednar commented Dec 11, 2019

jonmmease commented Aug 21, 2019 •

edited by jbednar

Loading

jonmmease commented Oct 17, 2019 •

edited by jbednar

Loading