Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement all operations that require only equality. #839

Closed
Closed
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
67a2a6a
Start refactoring binops to handle various ops and inputs.
thomcom Dec 1, 2022
3038253
Abstraction working with binops that depend on pip.
thomcom Dec 1, 2022
d7a842b
Clean it up in GeoSeries.
thomcom Dec 1, 2022
3301b95
Correct abstraction for pip_only tests.
thomcom Dec 1, 2022
f6688d7
Support the seven extra pip tests.
thomcom Dec 2, 2022
77b6858
Support all supportable pip calls.
thomcom Dec 2, 2022
c69d2e6
Remove unused abstraction.
thomcom Dec 2, 2022
de923c5
Clean up binops.intersects docs.
thomcom Dec 2, 2022
67b0df6
Pass all equals tests except one.
thomcom Dec 2, 2022
b4ea5e5
Merge
thomcom Dec 5, 2022
305f937
Finish abstracting pip based ops.
thomcom Dec 5, 2022
707217b
Pass all multi tests.
thomcom Dec 5, 2022
376833a
Add equals only binops.
thomcom Dec 7, 2022
f0d0c46
Rename binops to binpreds.
thomcom Dec 8, 2022
441a585
Cleaning up based on review.
thomcom Dec 8, 2022
e02a669
Cleaned up unneeded future material.
thomcom Dec 8, 2022
6444a4c
Resolve has_same_dimension naming issue.
thomcom Dec 8, 2022
29870d4
Fix postprocess docs.
thomcom Dec 9, 2022
c2cc722
Factor out op string argument and move logic to the __call__ function…
thomcom Dec 9, 2022
18631e3
Fix bug with complete overlap
thomcom Dec 9, 2022
a1ab8a7
Merge new pip only binpreds refactor into equals.
thomcom Dec 9, 2022
023247e
Pass all equals only tests with new factoring.
thomcom Dec 9, 2022
3ddf7c4
Fix contains issue.
thomcom Dec 9, 2022
82f40ec
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Dec 12, 2022
b71f34f
Handle review comments from @isVoid.
thomcom Dec 12, 2022
5338a73
Remove more equals code.
thomcom Dec 12, 2022
af5ae69
Merge branch 'feature/all_pip_operations' into feature/all_equals_ope…
thomcom Dec 12, 2022
34b3fbf
Update python/cuspatial/cuspatial/core/geoseries.py
thomcom Dec 13, 2022
7be5a73
Handle review comments from @harrism
thomcom Dec 13, 2022
230b0b7
Merge branch 'feature/all_equals_operations' of github.com:thomcom/cu…
thomcom Dec 13, 2022
e2a3616
Merge branch 'branch-23.02' into feature/all_equals_operations
thomcom Dec 13, 2022
04cd58d
Update python/cuspatial/cuspatial/core/geoseries.py
thomcom Dec 13, 2022
f749ca0
Replace operation with predicate
thomcom Dec 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
323 changes: 323 additions & 0 deletions python/cuspatial/cuspatial/core/binpreds/binpreds.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,323 @@
# Copyright (c) 2022, NVIDIA CORPORATION.

from abc import ABC, abstractmethod

import cudf

from cuspatial.core._column.geocolumn import GeoColumn
from cuspatial.core.binpreds.contains import contains_properly
from cuspatial.utils.column_utils import (
contains_only_linestrings,
contains_only_multipoints,
contains_only_polygons,
has_same_geometry,
)


class BinaryPredicate(ABC):
@abstractmethod
def preprocess(self, lhs, rhs):
"""Preprocess the input data for the binary operation. This method
should be implemented by subclasses. Preprocess and postprocess are
used to implement the discrete math rules of the binary operations.

Notes
-----
Should only be called once.
Copy link
Contributor Author

@thomcom thomcom Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Should only be called once.
Should only be called once.
Inputs `lhs` and `rhs` may be index aligned or may not be, depending
on the argument to `BinaryPredicate.__init__(...align)`.


Parameters
----------
op : str
The binary operation to perform.
lhs : GeoSeries
The left-hand-side of the GeoSeries-level binary operation.
rhs : GeoSeries
The right-hand-side of the GeoSeries-level binary operation.

Returns
-------
GeoSeries
The left-hand-side of the internal binary operation, may be
reordered.
GeoSeries
The right-hand-side of the internal binary operation, may be
reordered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the aligned index

"""
return (lhs, rhs, cudf.RangeIndex(len(rhs)))

@abstractmethod
def postprocess(self, point_indices, point_result):
"""Postprocess the output data for the binary operation. This method
should be implemented by subclasses.

Postprocess converts the raw results of the binary operation into
the final result. This is where the discrete math rules are applied.

Parameters
----------
op : str
The binary operation to post process. Determines for example the
set operation to use for computing the result.
point_indices : cudf.Series
The indices of the points in the original GeoSeries.
point_result : cudf.Series
The raw result of the binary operation.

Returns
-------
cudf.Series
The output of the post processing, True/False results for
the specified binary op.
"""
pass

def __init__(self, lhs, rhs, align=True):
"""Compute the binary operation `op` on `lhs` and `rhs`.

There are ten binary operations supported by cuspatial:
- `.equals`
- `.disjoint`
- `.touches`
- `.contains`
- `.contains_properly`
- `.covers`
- `.intersects`
- `.within`
- `.crosses`
- `.overlaps`

There are thirty-six ordering combinations of `lhs` and `rhs`, the
unordered pairs of each `point`, `multipoint`, `linestring`,
`multilinestring`, `polygon`, and `multipolygon`. The ordering of
`lhs` and `rhs` is important because the result of the binary
operation is not symmetric. For example, `A.contains(B)` is not
the same as `B.contains(A)`.

Parameters
----------
op : str
The binary operation to perform.
lhs : GeoSeries
The left-hand-side of the binary operation.
rhs : GeoSeries
The right-hand-side of the binary operation.
align : bool
If True, align the indices of `lhs` and `rhs` before performing
the binary operation. If False, `lhs` and `rhs` must have the
same index.

Returns
-------
GeoSeries
A GeoSeries containing the result of the binary operation.
"""
(self.lhs, self.rhs) = lhs.align(rhs) if align else (lhs, rhs)
self.align = align

def __call__(self) -> cudf.Series:
"""Return the result of the binary operation."""
# Type disambiguation
# Type disambiguation has a large effect on the decisions of the
# algorithm.
(lhs, rhs, indices) = self.preprocess(self.lhs, self.rhs)

# Binpred call
point_result = self._op(lhs, rhs)

# Postprocess: Apply discrete math rules to identify relationships.
return self.postprocess(indices, point_result)


class ContainsProperlyBinpred(BinaryPredicate):
def preprocess(self, lhs, rhs):
"""Preprocess the input GeoSeries to ensure that they are of the
correct type for the operation."""
# RHS conditioning:
point_indices = None
# point in polygon
if contains_only_linestrings(rhs):
# condition for linestrings
geom = rhs.lines
elif contains_only_polygons(rhs) is True:
# polygon in polygon
geom = rhs.polygons
elif contains_only_multipoints(rhs) is True:
# mpoint in polygon
geom = rhs.multipoints
else:
# no conditioning is required
geom = rhs.points
xy_points = geom.xy

# Arrange into shape for calling point-in-polygon, intersection, or
# equals
point_indices = geom.point_indices()
from cuspatial.core.geoseries import GeoSeries

final_rhs = GeoSeries(
GeoColumn._from_points_xy(xy_points._column)
).points
return (lhs, final_rhs, point_indices)

def _op(self, lhs, points):
"""Compute the contains_properly relationship between two GeoSeries.
A feature A contains another feature B if no points of B lie in the
exterior of A, and at least one point of the interior of B lies in the
interior of A. This is the inverse of `within`."""
if not contains_only_polygons(lhs):
raise TypeError(
"`.contains` can only be called with polygon series."
)

# call pip on the three subtypes on the right:
point_result = contains_properly(
points.x,
points.y,
lhs.polygons.part_offset[:-1],
lhs.polygons.ring_offset[:-1],
lhs.polygons.x,
lhs.polygons.y,
)
return point_result

def postprocess(self, point_indices, point_result):
"""Postprocess the output GeoSeries to ensure that they are of the
correct type for the operation."""
result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
df_result = result
# Discrete math recombination
if (
contains_only_linestrings(self.rhs)
or contains_only_polygons(self.rhs)
or contains_only_multipoints(self.rhs)
):
# process for completed linestrings, polygons, and multipoints.
# Not necessary for points.
df_result = (
result.groupby("idx").sum().sort_index()
== result.groupby("idx").count().sort_index()
)
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result


class OverlapsBinpred(ContainsProperlyBinpred):
def postprocess(self, point_indices, point_result):
# Same as contains_properly, but we need to check that the
# dimensions are the same.
# TODO: Maybe change this to intersection
if not has_same_geometry(self.lhs, self.rhs):
return cudf.Series([False] * len(self.lhs))
result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
df_result = result
# Discrete math recombination
if contains_only_linestrings(self.rhs):
df_result = (
result.groupby("idx").sum().sort_index()
== result.groupby("idx").count().sort_index()
)
elif contains_only_polygons(self.rhs) or contains_only_multipoints(
self.rhs
):
partial_result = result.groupby("idx").sum()
df_result = (partial_result > 0) & (
partial_result < len(point_result)
)
else:
df_result = result.groupby("idx").sum() > 1
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result


class IntersectsBinpred(ContainsProperlyBinpred):
def preprocess(self, lhs, rhs):
if contains_only_polygons(rhs):
(lhs, rhs) = (rhs, lhs)
return super().preprocess(lhs, rhs)


class WithinBinpred(ContainsProperlyBinpred):
def preprocess(self, lhs, rhs):
if contains_only_polygons(rhs):
(lhs, rhs) = (rhs, lhs)
return super().preprocess(lhs, rhs)

def postprocess(self, point_indices, point_result):
"""Postprocess the output GeoSeries to ensure that they are of the
correct type for the operation."""
result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
df_result = result
# Discrete math recombination
if (
contains_only_linestrings(self.rhs)
or contains_only_polygons(self.rhs)
or contains_only_multipoints(self.rhs)
):
# process for completed linestrings, polygons, and multipoints.
# Not necessary for points.
df_result = (
result.groupby("idx").sum().sort_index()
== result.groupby("idx").count().sort_index()
)
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result


class EqualsBinpred(BinaryPredicate):
def preprocess(self, lhs, rhs):
return (lhs, rhs, None)

def postprocess(self, point_indices, point_result):
if isinstance(point_result, bool):
return point_result
elif len(point_result) == 1:
return point_result[0]

def _op(self, lhs, rhs):
"""Compute the equals relationship between two GeoSeries."""
result = False
if contains_only_points(lhs):
result = lhs.points.xy.equals(rhs.points.xy)
elif contains_only_linestrings(lhs):
result = lhs.lines.xy.equals(rhs.lines.xy)
elif contains_only_polygons(lhs):
result = lhs.polygons.xy.equals(rhs.polygons.xy)
elif contains_only_multipoints(lhs):
result = lhs.multipoints.xy.equals(rhs.multipoints.xy)
Comment on lines +295 to +302
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to just get the geometry as the single type that the geoseries contains, rather than doing this multi-step if/else every time?

Suggested change
if contains_only_points(lhs):
result = lhs.points.xy.equals(rhs.points.xy)
elif contains_only_linestrings(lhs):
result = lhs.lines.xy.equals(rhs.lines.xy)
elif contains_only_polygons(lhs):
result = lhs.polygons.xy.equals(rhs.polygons.xy)
elif contains_only_multipoints(lhs):
result = lhs.multipoints.xy.equals(rhs.multipoints.xy)
if contains_single_geometry_type(lhs):
result = lhs.geometry.as_type(lhs.type).xy.equals(rhs.geometry.as_type(lhs.type).xy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't implemented that yet. It is basically just remapping the input_types buffer of the GeoColumn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be very valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will come in handy for fixing the bug in equals I identified that caused me to close this PR: I was using .equals instead of .geom_equals. 😮‍💨

return result


class CrossesBinpred(EqualsBinpred):
def postprocess(self, point_indices, point_result):
if has_same_geometry(self.lhs, self.rhs) and contains_only_points(
self.lhs
):
return cudf.Series([False] * len(self.lhs))
Comment on lines +308 to +311
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can avoid computing _op and early return this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good idea I'll consider that in the abstraction design. My first impulse is to add a bool to the BinaryPredicate class called done. _op and postprocess will only call if not done.

My second impulse is to not perform this calculation in postprocess but just do it in _op instead.

I will probably do another pass of the inheritance design in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also considering duck typing _op and replacing it with the correct predicate function if preprocess deigns. Alternatively I could allow specification of an alternate _op that would be used if defined.

result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
df_result = result
# Discrete math recombination
if (
contains_only_linestrings(self.rhs)
or contains_only_polygons(self.rhs)
or contains_only_multipoints(self.rhs)
):
df_result = ~result
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result


class CoversBinpred(EqualsBinpred):
def postprocess(self, point_indices, point_result):
return cudf.Series(point_result, index=point_indices)
Loading