-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement all operations that require only equality. #839
Implement all operations that require only equality. #839
Conversation
… instead of __init__.
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for opportunities to be less repetitive, and to make the documentation more consistent (internally and with GeoPandas).
if contains_only_points(lhs) and contains_only_points(rhs): | ||
return True | ||
elif contains_only_multipoints(lhs) and contains_only_multipoints(rhs): | ||
return True | ||
elif contains_only_linestrings(lhs) and contains_only_linestrings(rhs): | ||
return True | ||
elif contains_only_polygons(lhs) and contains_only_polygons(rhs): | ||
return True | ||
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this would be more efficient and simpler if you first confirm that each side has a single geometry type, and then compare the geometry type. As it is, contains_only_*
re-confirms that there is only a single geometry type each time. If you can also provide a way to just get a value representing the geometry types contained in a GS, then you only have to check that one of them has only a single geometry type, and then compare the types codes of both sides.
if contains_only_points(lhs) and contains_only_points(rhs): | |
return True | |
elif contains_only_multipoints(lhs) and contains_only_multipoints(rhs): | |
return True | |
elif contains_only_linestrings(lhs) and contains_only_linestrings(rhs): | |
return True | |
elif contains_only_polygons(lhs) and contains_only_polygons(rhs): | |
return True | |
else: | |
return False | |
if contains_single_geometry_type(lhs): | |
return geometry_types(lhs) == geometry_types(rhs) | |
else: | |
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a geometry_types
function yet, what does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hypothetical. The idea is to return something that enables uniquely identifying all the geometry types contained in the series, so they can be compared. E.g. a bitmask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably depend on pandas ExtensionDtype and make a GeometryDtype
and assign to dtype
. We can assign more meta data to GeometryDtype
, for example, the underlying mixed geometry type. We can model GeometryDtype
to UnionType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate effort? We can create an issue from this thread if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool idea I will look at.
Geometries cross if they have some but not all interior points in | ||
common, have the same dimension, and the intersection of the | ||
interiors of the geometries has the dimension of the geometries | ||
themselves minus one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we have two polygons that overlap, do they also cross? Confused by the "dimension of the geometries themselves minus 1".
How about a line that crosses a polygon. The intersection has 1D geometry, right? But that is not one less than the dimension of both the geometries -- the line is 1D and the polygon is 2D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In A Small Set of Formal Topological Relationships Suitable for End User Interaction (SSFTR),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two polygons that overlap I think share an interior dimension, that is
In order for two polygons to cross they have to share an overlap that is just a line, and it can't only be in the exterior. This is essentially impossible for two areas, so we may not see cases of polygon crossing.
cross
looks like it applies primarily to linestrings, as the dimensionality of the interior overlap will always be 0 for linestring.cross(linestring) and 1 for linestring.cross(polygon)
or polygon.cross(linestring)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking again at SSFTR I see that cross
is applied when overlap
is applied when the dimension of the interior intersection equals the dimension of the interior.
""" | ||
return CoversBinpred(self, other, align)() | ||
|
||
def intersects(self, other, align=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at all these predicates on GeoSeries, I realized we really should just start from a copy of the documentation from GeoPandas GeoSeries here. e.g. for intersects:
Returns a Series of dtype('bool') with value True for each aligned geometry that intersects other.
An object is said to intersect other if its boundary and interior intersects in any way with those of the other.
The operation works on a 1-to-1 row-wise manner:
../../../_images/binary_op-01.svg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty much using GeoPandas as my source, but trying not to just copy everything outright. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure every operation has consistent docs in terms of what is documented, what is defined, and what terminology and mathematics are used in the definitions.
if contains_only_points(lhs): | ||
result = lhs.points.xy.equals(rhs.points.xy) | ||
elif contains_only_linestrings(lhs): | ||
result = lhs.lines.xy.equals(rhs.lines.xy) | ||
elif contains_only_polygons(lhs): | ||
result = lhs.polygons.xy.equals(rhs.polygons.xy) | ||
elif contains_only_multipoints(lhs): | ||
result = lhs.multipoints.xy.equals(rhs.multipoints.xy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to just get the geometry as the single type that the geoseries contains, rather than doing this multi-step if/else every time?
if contains_only_points(lhs): | |
result = lhs.points.xy.equals(rhs.points.xy) | |
elif contains_only_linestrings(lhs): | |
result = lhs.lines.xy.equals(rhs.lines.xy) | |
elif contains_only_polygons(lhs): | |
result = lhs.polygons.xy.equals(rhs.polygons.xy) | |
elif contains_only_multipoints(lhs): | |
result = lhs.multipoints.xy.equals(rhs.multipoints.xy) | |
if contains_single_geometry_type(lhs): | |
result = lhs.geometry.as_type(lhs.type).xy.equals(rhs.geometry.as_type(lhs.type).xy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't implemented that yet. It is basically just remapping the input_types
buffer of the GeoColumn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would be very valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will come in handy for fixing the bug in equals I identified that caused me to close this PR: I was using .equals
instead of .geom_equals
. 😮💨
Co-authored-by: Mark Harris <mharris@nvidia.com>
…spatial into feature/all_equals_operations
Co-authored-by: Mark Harris <mharris@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments.
Returns | ||
------- | ||
GeoSeries | ||
The left-hand-side of the internal binary operation, may be | ||
reordered. | ||
GeoSeries | ||
The right-hand-side of the internal binary operation, may be | ||
reordered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the aligned index
if has_same_geometry(self.lhs, self.rhs) and contains_only_points( | ||
self.lhs | ||
): | ||
return cudf.Series([False] * len(self.lhs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can avoid computing _op
and early return this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good idea I'll consider that in the abstraction design. My first impulse is to add a bool to the BinaryPredicate
class called done
. _op
and postprocess
will only call if not done
.
My second impulse is to not perform this calculation in postprocess
but just do it in _op
instead.
I will probably do another pass of the inheritance design in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also considering duck typing _op
and replacing it with the correct predicate function if preprocess
deigns. Alternatively I could allow specification of an alternate _op
that would be used if defined.
if contains_only_points(lhs) and contains_only_points(rhs): | ||
return True | ||
elif contains_only_multipoints(lhs) and contains_only_multipoints(rhs): | ||
return True | ||
elif contains_only_linestrings(lhs) and contains_only_linestrings(rhs): | ||
return True | ||
elif contains_only_polygons(lhs) and contains_only_polygons(rhs): | ||
return True | ||
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably depend on pandas ExtensionDtype and make a GeometryDtype
and assign to dtype
. We can assign more meta data to GeometryDtype
, for example, the underlying mixed geometry type. We can model GeometryDtype
to UnionType
.
I found a number of issues with my |
|
||
Notes | ||
----- | ||
Should only be called once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should only be called once. | |
Should only be called once. | |
Inputs `lhs` and `rhs` may be index aligned or may not be, depending | |
on the argument to `BinaryPredicate.__init__(...align)`. |
Description
This PR implements the set of operations listed in #838.
Closes #838
Closes #832
Depends on #834
Checklist