Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Bad results returned from pairwise_linestring_intersection on large self-equal LineStrings. #1200

Closed
thomcom opened this issue Jun 23, 2023 · 2 comments · Fixed by #1209
Assignees
Labels
bug Something isn't working
Milestone

Comments

@thomcom
Copy link
Contributor

thomcom commented Jun 23, 2023

Version

23.08

On which installation method(s) does this occur?

Rapids-Compose

Describe the issue

Hiya! I found a new issue with pairwise_linestring_intersection today working on the Watershed Demo Notebook for Release 23.06.

When pairwise_linestring_intersection is called using the same linestring for the left and right hand sides, results are unpredictable and eventually fail as the linestring size increases. The API fails because the second element returned by pairwise_linestring_intersection eventually creates an invalid GeoSeries object with bad offsets.

For the reproducer, I created a LineString comprised of diagonal segments of length $sqrt(2)$ originating at the origin and rising by 1 in x and y. The LineString then connects the highest point along the diagonal with a point directly beneath it $(x, 0)$, and that returns to the origin making a closed LineString.

With this specific example, valid (though strange) results are returned up to a LineString of length 129. At 130 the API returns a broken GeoSeries for geoms. The results are strange because the first n of the segments get grouped into a single segment semi-randomly, say, ((0, 0), (32, 32)), and then the subsequent segments all are returned separately: ((32, 32), (33, 33)), ((33, 33), (34, 34)), etc.

Is this a valid error? There are two constraints that maybe make this invalid:

  1. The LineStrings in the pairwise comparison are identical
  2. The LineStrings are closed with the same start and end vertex.

I can see from the result returned here:

    geoms, look_back_ids = c_pairwise_linestring_intersection(
        linestrings1.lines.column(), linestrings2.lines.column()
    )

    (
        geometry_collection_offset,
        types_buffer,
        offset_buffer,
        points,
        segments,
    ) = geoms

That types_buffer when the length of the LineString is 130 all come back as points, that is, type = 0. When the length of the LineString is 129, they are returned correctly as type = 2. look_back_ids are also corrupt at length 130 and apparently valid at length 129 in this case.

Minimum reproducible example

    import cuspatial
    import cupy as cp
    from cuspatial.core.binops.intersection import pairwise_linestring_intersection

    size = 130.0
    xy = cp.concatenate([
        cp.arange(size - 2).repeat(2),
        cp.array([size - 3, 0, 0, 0])
    ])
    s1 = cuspatial.GeoSeries.from_linestrings_xy(
        xy,
        [0, len(xy)/2],
        [0, 1]
    )
    s2 = cuspatial.GeoSeries.from_linestrings_xy(
        xy,
        [0, len(xy)/2],
        [0, 1]
    )

    offset, geoms, ids = pairwise_linestring_intersection(s1, s2)
    print(offset)
    print(geoms._column._meta.input_types)
    print(geoms._column.lines._column)
    print(geoms.to_geopandas())

Relevant log output

>>>>>>>>>>>>>>>>>>>>>> captured stdout >>>>>>>>>>>>>>>>>>>>>>
<cudf.core.column.numerical.NumericalColumn object at 0x7f90b195e4c0>
[
  0,
  354
]
dtype: int32
0     0
1     0
2     0
3     0
4     0
     ..
93    0
94    0
95    0
96    0
97    0
Length: 98, dtype: int8
<cudf.core.column.lists.ListColumn object at 0x7f90b195e5c0>
[
  [
    [
      [
        0,
        0
      ],
      [
        32,
        32
      ]
    ]
  ],
  [
    [
      [
        32,
        32
      ],
      [
        33,
        33
      ]
    ]
  ],
  ...
  [
    [
      [
        126,
        126
      ],
      [
        127,
        127
      ]
    ]
  ],
  [
    [
      [
        127,
        0
      ],
      [
        127,
        127
      ]
    ]
  ]
]
dtype: list
>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>

    def test_same_linestring():
        size = 130.0
        xy = cp.concatenate([
            cp.arange(size - 2).repeat(2),
            cp.array([size - 3, 0, 0, 0])
        ])
        s1 = cuspatial.GeoSeries.from_linestrings_xy(
            xy,
            [0, len(xy)/2],
            [0, 1]
        )
        s2 = cuspatial.GeoSeries.from_linestrings_xy(
            xy,
            [0, len(xy)/2],
            [0, 1]
        )
    
        offset, geoms, ids = pairwise_linestring_intersection(s1, s2)
        print(offset)
        print(geoms._column._meta.input_types)
        print(geoms._column.lines._column)
>       print(geoms.to_geopandas())

tests/basicpreds/test_intersections.py:261: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
core/geoseries.py:474: in to_geopandas
    final_union_slice.to_shapely(),
core/geoseries.py:540: in to_shapely
    shapely_serialization_fn(union[result_index].as_py())
pyarrow/array.pxi:1295: in pyarrow.lib.Array.__getitem__
    ???
pyarrow/array.pxi:1298: in pyarrow.lib.Array.getitem
    ???
pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   pyarrow.lib.ArrowIndexError: index with value of 0 is out-of-bounds for array of length 0

pyarrow/error.pxi:127: ArrowIndexError
>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>
> /home/coder/cuspatial/python/cuspatial/cuspatial/pyarrow/error.pxi(127)pyarrow.lib.check_status()

Environment details

No response

Other/Misc.

No response

@thomcom thomcom added the bug Something isn't working label Jun 23, 2023
@thomcom thomcom added this to the DE-9IM milestone Jun 23, 2023
@thomcom thomcom changed the title [BUG]: Bad results returned from pairwise_linestring_intersection on large LineStrings. [BUG]: Bad results returned from pairwise_linestring_intersection on large self-equal LineStrings. Jun 23, 2023
@isVoid
Copy link
Contributor

isVoid commented Jun 23, 2023

The LineStrings in the pairwise comparison are identical
The LineStrings are closed with the same start and end vertex.

Definitely sounds like a valid input. This pattern that the bug only emerge with large input makes me think if this is the same error as last time - misusing the thrust API where input and output must not overlap.

@isVoid
Copy link
Contributor

isVoid commented Jun 24, 2023

Confirmed reproducible in C++.

@harrism harrism moved this from Todo to In Progress in cuSpatial Jul 17, 2023
rapids-bot bot pushed a commit that referenced this issue Jul 25, 2023
This PR is part-1 fix to \#1200. In `find_and_combine_segments` the N^2 algorithm depends on the fact that the API needs to be presorted with certain criteria. This Pr adds such sorting.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - H. Thomson Comer (https://github.com/thomcom)

URL: #1207
@isVoid isVoid moved this from In Progress to Review in cuSpatial Jul 28, 2023
rapids-bot bot pushed a commit that referenced this issue Jul 28, 2023
EDIT: the thrust bug is a known issue. Tracked by NVIDIA/cccl#153

This PR fixes an issue in `remove_if`. Strangely, when calling `reduce_by_key` with iterator to integer,
and when using `thrust::plus<index_t>()`, even if by definition the argument type of the plus operator is strongly typed
with a higher bit  width integer type, and I expected that the flags (`uint8_t`) were cast to the higher
bit depth before addition, the overflow still happens. I have filed a thread in the thrust channel to discuss if this
is a bug in thrust.

Meanwhile, a quick WAR is to explicitly use a transform iterator to cast the uchar in to `index_t` before adding.
This should give the correct result.

Fixes #1200 
Depend on #1207

Authors:
  - Michael Wang (https://github.com/isVoid)
  - H. Thomson Comer (https://github.com/thomcom)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - H. Thomson Comer (https://github.com/thomcom)

URL: #1209
@github-project-automation github-project-automation bot moved this from Review to Done in cuSpatial Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants