Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement all operations that require only equality. #839

Closed

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Dec 2, 2022

Description

This PR implements the set of operations listed in #838.
Closes #838
Closes #832
Depends on #834

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the Python Related to Python code label Dec 2, 2022
@thomcom thomcom self-assigned this Dec 6, 2022
@thomcom thomcom added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 6, 2022
@thomcom thomcom marked this pull request as ready for review December 6, 2022 17:05
@thomcom thomcom requested a review from a team as a code owner December 6, 2022 17:05
@thomcom thomcom requested review from harrism and isVoid December 6, 2022 17:05
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for opportunities to be less repetitive, and to make the documentation more consistent (internally and with GeoPandas).

Comment on lines +108 to +117
if contains_only_points(lhs) and contains_only_points(rhs):
return True
elif contains_only_multipoints(lhs) and contains_only_multipoints(rhs):
return True
elif contains_only_linestrings(lhs) and contains_only_linestrings(rhs):
return True
elif contains_only_polygons(lhs) and contains_only_polygons(rhs):
return True
else:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this would be more efficient and simpler if you first confirm that each side has a single geometry type, and then compare the geometry type. As it is, contains_only_* re-confirms that there is only a single geometry type each time. If you can also provide a way to just get a value representing the geometry types contained in a GS, then you only have to check that one of them has only a single geometry type, and then compare the types codes of both sides.

Suggested change
if contains_only_points(lhs) and contains_only_points(rhs):
return True
elif contains_only_multipoints(lhs) and contains_only_multipoints(rhs):
return True
elif contains_only_linestrings(lhs) and contains_only_linestrings(rhs):
return True
elif contains_only_polygons(lhs) and contains_only_polygons(rhs):
return True
else:
return False
if contains_single_geometry_type(lhs):
return geometry_types(lhs) == geometry_types(rhs)
else:
return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a geometry_types function yet, what does it do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hypothetical. The idea is to return something that enables uniquely identifying all the geometry types contained in the series, so they can be compared. E.g. a bitmask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably depend on pandas ExtensionDtype and make a GeometryDtype and assign to dtype. We can assign more meta data to GeometryDtype, for example, the underlying mixed geometry type. We can model GeometryDtype to UnionType.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate effort? We can create an issue from this thread if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool idea I will look at.

Comment on lines +928 to +931
Geometries cross if they have some but not all interior points in
common, have the same dimension, and the intersection of the
interiors of the geometries has the dimension of the geometries
themselves minus one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we have two polygons that overlap, do they also cross? Confused by the "dimension of the geometries themselves minus 1".

How about a line that crosses a polygon. The intersection has 1D geometry, right? But that is not one less than the dimension of both the geometries -- the line is 1D and the polygon is 2D.

Copy link
Contributor Author

@thomcom thomcom Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In A Small Set of Formal Topological Relationships Suitable for End User Interaction (SSFTR), $cross$ is defined as

$$ < \lambda_1, cross \lambda_2> \iff dim(\lambda_1^o \cap \lambda_2^o) = (max(dim(\lambda_1^o), dim(\lambda_2^o))-2 \land (\lambda_1 \cap \lambda_2 \neq \lambda_1) \land (\lambda_1 \cap \lambda_2 \neq \lambda_2) $$

Copy link
Contributor Author

@thomcom thomcom Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two polygons that overlap I think share an interior dimension, that is $$dim(\lambda_1^o \cap \lambda_2^o) = 2$$

In order for two polygons to cross they have to share an overlap that is just a line, and it can't only be in the exterior. This is essentially impossible for two areas, so we may not see cases of polygon crossing.

cross looks like it applies primarily to linestrings, as the dimensionality of the interior overlap will always be 0 for linestring.cross(linestring) and 1 for linestring.cross(polygon) or polygon.cross(linestring).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again at SSFTR I see that cross is applied when $max(dim(\lambda_1^o), dim(\lambda_2^o)) -1 = \dim(\lambda_1^o \cap \lambda_2^o)$ and overlap is applied when the dimension of the interior intersection equals the dimension of the interior.

"""
return CoversBinpred(self, other, align)()

def intersects(self, other, align=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at all these predicates on GeoSeries, I realized we really should just start from a copy of the documentation from GeoPandas GeoSeries here. e.g. for intersects:

Returns a Series of dtype('bool') with value True for each aligned geometry that intersects other.

An object is said to intersect other if its boundary and interior intersects in any way with those of the other.

The operation works on a 1-to-1 row-wise manner:
../../../_images/binary_op-01.svg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty much using GeoPandas as my source, but trying not to just copy everything outright. What do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure every operation has consistent docs in terms of what is documented, what is defined, and what terminology and mathematics are used in the definitions.

Comment on lines +288 to +295
if contains_only_points(lhs):
result = lhs.points.xy.equals(rhs.points.xy)
elif contains_only_linestrings(lhs):
result = lhs.lines.xy.equals(rhs.lines.xy)
elif contains_only_polygons(lhs):
result = lhs.polygons.xy.equals(rhs.polygons.xy)
elif contains_only_multipoints(lhs):
result = lhs.multipoints.xy.equals(rhs.multipoints.xy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to just get the geometry as the single type that the geoseries contains, rather than doing this multi-step if/else every time?

Suggested change
if contains_only_points(lhs):
result = lhs.points.xy.equals(rhs.points.xy)
elif contains_only_linestrings(lhs):
result = lhs.lines.xy.equals(rhs.lines.xy)
elif contains_only_polygons(lhs):
result = lhs.polygons.xy.equals(rhs.polygons.xy)
elif contains_only_multipoints(lhs):
result = lhs.multipoints.xy.equals(rhs.multipoints.xy)
if contains_single_geometry_type(lhs):
result = lhs.geometry.as_type(lhs.type).xy.equals(rhs.geometry.as_type(lhs.type).xy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't implemented that yet. It is basically just remapping the input_types buffer of the GeoColumn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be very valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will come in handy for fixing the bug in equals I identified that caused me to close this PR: I was using .equals instead of .geom_equals. 😮‍💨

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments.

Comment on lines 38 to 45
Returns
-------
GeoSeries
The left-hand-side of the internal binary operation, may be
reordered.
GeoSeries
The right-hand-side of the internal binary operation, may be
reordered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the aligned index

Comment on lines +308 to +311
if has_same_geometry(self.lhs, self.rhs) and contains_only_points(
self.lhs
):
return cudf.Series([False] * len(self.lhs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can avoid computing _op and early return this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good idea I'll consider that in the abstraction design. My first impulse is to add a bool to the BinaryPredicate class called done. _op and postprocess will only call if not done.

My second impulse is to not perform this calculation in postprocess but just do it in _op instead.

I will probably do another pass of the inheritance design in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also considering duck typing _op and replacing it with the correct predicate function if preprocess deigns. Alternatively I could allow specification of an alternate _op that would be used if defined.

Comment on lines +108 to +117
if contains_only_points(lhs) and contains_only_points(rhs):
return True
elif contains_only_multipoints(lhs) and contains_only_multipoints(rhs):
return True
elif contains_only_linestrings(lhs) and contains_only_linestrings(rhs):
return True
elif contains_only_polygons(lhs) and contains_only_polygons(rhs):
return True
else:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably depend on pandas ExtensionDtype and make a GeometryDtype and assign to dtype. We can assign more meta data to GeometryDtype, for example, the underlying mixed geometry type. We can model GeometryDtype to UnionType.

@thomcom
Copy link
Contributor Author

thomcom commented Dec 14, 2022

I found a number of issues with my equals implementation and I'm closing this for now.

@thomcom thomcom closed this Dec 14, 2022

Notes
-----
Should only be called once.
Copy link
Contributor Author

@thomcom thomcom Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Should only be called once.
Should only be called once.
Inputs `lhs` and `rhs` may be index aligned or may not be, depending
on the argument to `BinaryPredicate.__init__(...align)`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: Binary predicates that depend only on equals [FEA]: Implement binary predicates that depend on equals
3 participants