Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ doc links to library_design.md and minor documentation fixes #700

Merged
merged 3 commits into from
Sep 29, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/source/api_docs/trajectory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,4 @@ Functions for identifying and grouping trajectories from point data.

.. autofunction:: cuspatial.derive_trajectories
.. autofunction:: cuspatial.trajectory_distances_and_speeds
.. autofunction:: cuspatial.directed_hausdorff_distance
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate entry also present in spatial.rst

.. autofunction:: cuspatial.trajectory_bounding_boxes
30 changes: 18 additions & 12 deletions docs/source/developer_guide/library_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
cuSpatial has two main components: the cuSpatial Python package and the `libcuspatial` C++ library,
referred to as `cuspatial` and `libcuspatial` respectively in this documentation. This page
discusses the design of `cuspatial`. For information on `libcuspatial`, see the [libcuspatial
developer guide](TODO link) and [C++ API reference](TODO link).
developer guide](https://github.com/rapidsai/cuspatial/blob/branch-22.10/cpp/doc/DEVELOPER_GUIDE.md)
and [C++ API reference](https://docs.rapids.ai/api/libcuspatial/stable/).

## Overview

Expand All @@ -12,21 +13,23 @@ At a high level, `cuspatial` has three parts:
- A set of computation APIs
- A Cython API layer

## GPU Accelerated `GeoDataFrame` and `GeoSeries`
## Core Data Structures

```{note}
Note: the core data structure of cuSpatial shares the same name as that of `geopandas`, so we refer
to geopandas' dataframe object as `geopandas.GeoDataFrame` and to cuspatial's dataframe object as
`GeoDataFrame`.
```

### Introduction to GeoArrow Format

----------------------------------------------------------------------------------------------------
Under the hood, cuspatial can perform parallel computation on geometry
data thanks to its
[structure of arrays](https://en.wikipedia.org/wiki/Parallel_array) (SoA)
format. Specifically, cuspatial adopts geoarrow format. Geoarrow is derived
from the Apache Arrow list type, and it adopts a
[`Variable-size List Layout`](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout),
with the inner-most layer storing the points in a `Fixed-size list layout` array
with `size==2`.
format. Specifically, cuspatial adopts geoarrow format. Geoarrow is an extension
to arrow format. It adopts arrow's
[`Variable-size List Layout`](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout)
to provide support to geometry arrays.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

By definition, each increase in geometry complexity (dimension, or multi-
geometry) requires an extra level of indirection. In cuSpatial, we use the following names for the levels of indirection from
Expand All @@ -41,23 +44,26 @@ of geometry types to be present in the same column by adopting the
Read the [geoarrow format specification](https://github.com/geopandas/geo-arrow-spec/blob/main/format.md)
for more detail.

### GeoColumn

cuSpatial implements a specialization of Arrow dense union via `GeoColumn` and
`GeoMeta`. A `GeoColumn` is a composition of child columns and a
`GeoMeta` object. The `GeoMeta` owns two arrays that are similar to the
types buffer and offsets buffer from Arrow dense union.

```{note}
Currently, `GeoColumn` only implements four concrete array types: `points`,
`multipoints`, multilinestrings (called `lines`) and multipolygons (called
`polygons`). Linestrings and multilinestrings are stored uniformly as
multilinestrings in the `multilinestrings` array. Polygons and multipolygons are
Currently, `GeoColumn` implements four concrete array types: `points`,
`multipoints`, multilinestrings and multipolygons. Linestrings and
multilinestrings are stored uniformly as multilinestrings in the
`multilinestrings` array. Polygons and multipolygons are
stored uniformly as multipolygons in the `multipolygons` array.

Points and multipoints are stored separately in different arrays, because
storing points in a multipoints array requires 50% more storage overhead.
While this may also be true for linestrings and polygons, many uses of
cuSpatial involve more complex linestrings and polygons, where the
storage overhead of multigeometry indirection is lower compared to points.
```

`GeoSeries` and `GeoDataFrame` inherit from `cudf.Series` and
`cudf.DataFrame` respectively. `Series` and `DataFrame` are both generic
Expand Down