Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add polygon support #181

Closed
brendancol opened this issue Jun 22, 2016 · 10 comments
Closed

Add polygon support #181

brendancol opened this issue Jun 22, 2016 · 10 comments
Assignees

Comments

@brendancol
Copy link
Collaborator

Add support for Canvas.polygon. Something like:

endangered_agg = cvs.polygon(df, 'my_polygon_field', how='sum', field='endangered_species_list_score')
@jonmmease
Copy link
Collaborator

jonmmease commented Aug 21, 2019

I think I have a pretty clear understanding of what this might look like now that I've finished the following PRs:

Here's what I'm thinking...

Polygon representation

I think the easiest thing to do here is to represent a polygon set as a pair of arrays, one for each the x and y coordinate. nan values are then used to indicate the end of one polygon and the start of the next.

Polygon fill rendering

In the quadmesh PR (#779) I implemented filling using the "crossing number" point-in-polygon test (http://geomalgorithms.com/a03-_inclusion.html). If we use this same algorithm for these sets of polygons then we naturally get the following behavior.

  • The first polygon is filled in if it's the only one
  • The second (and third, forth, ...) polygon acts as a hole if it is enclosed in the first polygon, or it is filled in as a separate polygon if it is outside of the first polygon.

Polygon API

I propose that the Canvas.polygon method have the same API as Canvas.line. We don't need to cover the full API right away, but in principle every combination of the axis and column arguments supported by Canvas.line could also be supported by Canvas.poly. Here are the two API uses that I expect to be most common:

Variable sized polygons, one per row. In this case, the x and y coordinates would be stored as RaggedArray columns in the DataFrame.

canvas.polygon(df, 'x', 'y', axis=1)

A collection of fixed-size polygons. This would be useful for displaying a large collection of polygons that all have the same number of vertices. In this case, the data frame would have a column for each x and y coordinate. For example, rendering a collection of pentagons would look something like this

canvas.polygon(df, ['x0', 'x1', 'x2', 'x3', 'x4'], ['y0', 'y1', 'y2', 'y3', 'y4'], axis=1)

Polygon outlines

A nice consequence of using this polygon representation and this API is that we get polygon outline rendering for free. Just change canvas.polygon to canvas.line. The line method already uses nan values to separate lines, and as long as the polygons are closed, the line method would automatically be able to draw the polygon outlines.

@jbrockmendel
Copy link

Are the possibly-nested polygons you're describing isomorphic to shapely objects? I've been talking with @jbednar about a data structure that looks a lot like what you describe here

The second (and third, forth, ...) polygon acts as a hole if it is enclosed in the first polygon, or it is filled in as a separate polygon if it is outside of the first polygon.

but I've been trying to make this representation round-trip with shapely objects. Is that not the goal @jbednar?

@jbednar
Copy link
Member

jbednar commented Oct 12, 2019

I believe shapely can represent any shape, so yes, supporting round tripping is a good idea...

@jbrockmendel
Copy link

implemented filling using the "crossing number" point-in-polygon test

@jonmmease how costly is this check? I was under the (incorrect @jbednar?) impression that the data structure was supposed to reflect when one polygon is "cut out" from another. If this doing this check at runtime is not too costly, the data representation I have in mind can be made much simpler.

@jonmmease
Copy link
Collaborator

Are the possibly-nested polygons you're describing isomorphic to shapely objects?

@jbrockmendel I haven't dug into how shapely represents Polygons, so I can't speak to the isomorphism at this point. But I definitely agree that this would be desirable.

For the curvilinear quadmesh implementation, I implemented quad filling using the crossing number method (http://geomalgorithms.com/a03-_inclusion.html). This algorithm processes the edges of one (or more) polygon(s) independently. It has the natural property that any polygon fully contained in another polygon acts as a hole. And a double nested polygon would act as a filled "island" inside the hole.

As described in http://geomalgorithms.com/a03-_inclusion.html, the crossing-number method breaks down in the case of self-intersecting polygons. In this case, the winding-number algorithm is a better choice.

As far as I can see, if the polygons don't self intersect and the polygons don't overlap with each other then the crossing-number and winding-number algorithms will produce identical filling solutions. But just to be safe, it's probably better for Datashader to plan on using the winding number method for polygons. To support this, we would want to have an encoding of whether each polygon is a "fill" or "hole" polygon. In terms of the winding-number algorithm, this would be equivalent to whether the polygon is clockwise or counterclockwise.

I think this could be accomplished with a small modification to the encoding I suggested above. Instead of separating polygons with nan values, we could separate them with +inf/-inf values where the sign would indicate whether the following polygon is a "fill" or "hole" polygon (and you might have already suggested this elsewhere, but I don't remember for sure). This would assume that the first polygon is always a "fill" polygon, so if this isn't a safe assumption then we need to make another adjustment.

Is that helpful? Happy to discuss further!

@jbednar
Copy link
Member

jbednar commented Oct 13, 2019

See poc.ShapelyArray in https://github.com/jbrockmendel/geoarr for discussion.

@jonmmease , I imagine we could relax the isnan check in canvas.line to treat +Inf and -Inf as equivalent to nan, so it seems like either way we can support switching between .line and .polygon. Can you speculate on the performance implications of the crossing-number and winding-number options, for datashader rendering and for other relevant algorithms? If it's dramatic, disallowing self-intersecting polygons seems ok, as that could presumably be enforced as a pre-processing step when needed. If the performance is about the same, then the safer option sounds good...

@brendancol, any comments?

@jonmmease
Copy link
Collaborator

Yes, I agree that making sure that canvas.line treats +inf/-inf like nan is a fine solution (And it may already do that).

Can you speculate on the performance implications of the crossing-number and winding-number options.

For each row, it will amount to allocating one additional boolean array with length matching the number of edges, populating this array during the first traversal, and then using the element values to flip the sign of the contribution of each edge. My intuition is that the impact would be fairly negligible, maybe a couple of percentage points slower at most. So, I'd be in favor of going with the winding-number approach from the start.

@jonmmease
Copy link
Collaborator

jonmmease commented Oct 17, 2019

@jbrockmendel, one issue is that, from https://github.com/jbrockmendel/geoarr/blob/master/poc.py, it looks like shapely and your ShapelyArray store the x and y coordinates of polygons in a single interleaved array. For consistency with the rest of datashader, it would be preferable to have the x and y coordinates separated into two arrays.

Also, after looking carefully at the winding-order algorithm I'm pretty sure that we don't actually need to distinguish outer polygons from hole polygons if the polygons follow the convention that outer polygons are specified in ccw order and hole polygons are specified in cw order. And it looks like a shapely polygon can be transformed into this form using the polygon.orient function. We may still want to use the +/- inf separators for convenience when round tripping, but the fill algorithm would be simplest if we could assume this ccw/cw ordering convention.

@jbednar
Copy link
Member

jbednar commented Oct 17, 2019

I'm happy with that; it's a new storage format, and so we can decide on the conventions, including ordering. Separating the x and y coordinates makes sense at a datashader level (and for how Pandas and Xarray usually work), so it sounds reasonable to me, but there may be some implication I can't see.

I think at this point @jonmmease should just take whatever is useful from https://github.com/jbrockmendel/geoarr, extend Datashader's RaggedArray support to allow polygons, and implement rendering. Once it's implemented, we can then evaluate how the data structure works for other operations, and see if there are any changes other people (e.g. @brendancol) suggest.

@jbednar
Copy link
Member

jbednar commented Dec 11, 2019

Now in master.

@jbednar jbednar closed this as completed Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants