Skip to content

Commit

Permalink
Move experimental headers into main include/cuspatial directory (#…
Browse files Browse the repository at this point in the history
…1081)

Fixes #1083.

* Consolidates all headers into `include/cuspatial/`. This works without conflicts because header-only API headers have the `.cuh` extension while column-based API headers have the `.cpp` extension.
* Consolidates and reorganizes tests. Header-only and column-based tests are stored together.
* Updates documentation to remove references to `experimental` and also renames the `REFACTORING_GUIDE.md` to `HEADER_ONLY_API_GUIDE.md` and updates it.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - H. Thomson Comer (https://github.com/thomcom)
  - Michael Wang (https://github.com/isVoid)

URL: #1081
  • Loading branch information
harrism authored Apr 26, 2023
1 parent 2d1f9f8 commit 5f91e68
Show file tree
Hide file tree
Showing 173 changed files with 520 additions and 489 deletions.
8 changes: 4 additions & 4 deletions cpp/benchmarks/pairwise_linestring_distance.cu
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@
#include <nvbench/nvbench.cuh>

#include <cuspatial/detail/iterator.hpp>
#include <cuspatial/experimental/iterator_factory.cuh>
#include <cuspatial/experimental/linestring_distance.cuh>
#include <cuspatial/experimental/ranges/multilinestring_range.cuh>
#include <cuspatial/vec_2d.hpp>
#include <cuspatial/geometry/vec_2d.hpp>
#include <cuspatial/iterator_factory.cuh>
#include <cuspatial/linestring_distance.cuh>
#include <cuspatial/range/multilinestring_range.cuh>

#include <rmm/device_vector.hpp>
#include <rmm/exec_policy.hpp>
Expand Down
4 changes: 2 additions & 2 deletions cpp/benchmarks/point_in_polygon.cu
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
#include <benchmarks/fixture/rmm_pool_raii.hpp>
#include <nvbench/nvbench.cuh>

#include <cuspatial/experimental/point_in_polygon.cuh>
#include <cuspatial/vec_2d.hpp>
#include <cuspatial/geometry/vec_2d.hpp>
#include <cuspatial/point_in_polygon.cuh>

#include <rmm/device_vector.hpp>
#include <rmm/exec_policy.hpp>
Expand Down
4 changes: 2 additions & 2 deletions cpp/benchmarks/points_in_range.cu
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@

#include <cuspatial/detail/iterator.hpp>
#include <cuspatial/error.hpp>
#include <cuspatial/experimental/points_in_range.cuh>
#include <cuspatial/vec_2d.hpp>
#include <cuspatial/geometry/vec_2d.hpp>
#include <cuspatial/points_in_range.cuh>

#include <rmm/device_uvector.hpp>
#include <rmm/exec_policy.hpp>
Expand Down
4 changes: 2 additions & 2 deletions cpp/benchmarks/quadtree_on_points.cu
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
* limitations under the License.
*/

#include <cuspatial/experimental/point_quadtree.cuh>
#include <cuspatial/vec_2d.hpp>
#include <cuspatial/geometry/vec_2d.hpp>
#include <cuspatial/point_quadtree.cuh>

#include <benchmarks/fixture/rmm_pool_raii.hpp>
#include <nvbench/nvbench.cuh>
Expand Down
4 changes: 2 additions & 2 deletions cpp/doxygen/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,7 @@ INPUT = main_page.md \
developer_guide/DOCUMENTATION.md \
developer_guide/DEVELOPER_GUIDE.md \
developer_guide/TESTING.md \
developer_guide/REFACTORING_GUIDE.md \
developer_guide/HEADER_ONLY_API_GUIDE.md \
../include

# This tag can be used to specify the character encoding of the source files
Expand Down Expand Up @@ -1155,7 +1155,7 @@ HTML_FOOTER =
# obsolete.
# This tag requires that the tag GENERATE_HTML is set to YES.

HTML_STYLESHEET =
HTML_STYLESHEET =

# The HTML_EXTRA_STYLESHEET tag can be used to specify additional user-defined
# cascading style sheets that are included after the standard style sheets
Expand Down
25 changes: 12 additions & 13 deletions cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ refer to these additional files for further documentation of libcuspatial best p
* [Documentation Guide](DOCUMENTATION.md) for guidelines on documenting libcuspatial code.
* [Testing Guide](TESTING.md) for guidelines on writing unit tests.
* [Benchmarking Guide](BENCHMARKING.md) for guidelines on writing unit benchmarks.
* [Refactoring Guide](REFACTORING_GUIDE.md) for guidelines on refactoring legacy column-based APIs
into header-only APIs.
* [Header-only API Guide](HEADER_ONLY_API_GUIDE.md) for guidelines on the header-only API and
column-based API.

# Overview

Expand Down Expand Up @@ -38,27 +38,26 @@ TODO: add terms

External/public libcuspatial APIs are grouped based on functionality into an appropriately titled
header file in `cuspatial/cpp/include/cuspatial/`. For example,
`cuspatial/cpp/include/cuspatial/coordinate_transform.hpp` contains the declarations of public API
functions related to transforming coordinates. Note the `.hpp` file extension used to indicate a
C++ header file that can be included from a `.cpp` source file.
`cuspatial/cpp/include/cuspatial/projection.hpp` contains the declarations of public API
functions related to coordinate projection transforms. Note the `.hpp` file extension used to
indicate a C++ header file that can be included from a `.cpp` source file.

Header files should use the `#pragma once` include guard.

The naming of public column-based cuSpatial API headers should be consistent with the name of the
folder that contains the source files that implement the API. For example, the implementation of the
APIs found in `cuspatial/cpp/include/cuspatial/trajectory.hpp` are located in
`cuspatial/src/trajectory`. This rule obviously does not apply to the header-only API, since the
headers are the source files.
The folder that contains the source files that implement an API should be named consistently with
the name of the of the header for the API. For example, the implementation of the APIs found in
`cuspatial/cpp/include/cuspatial/trajectory.hpp` are located in `cuspatial/src/trajectory`. This
rule obviously does not apply to the header-only API, since the headers are the source files.

Likewise, unit tests reside in folders corresponding to the names of the API headers, e.g.
trajectory.hpp tests are in `cuspatial/tests/trajectory/`.

Internal API headers containing `detail` namespace definitions that are used across translation
units inside libcuspatial should be placed in `include/cuspatial/detail`.

Note that (currently) header-only API files are in `include/cuspatial/experimental`, and their tests
are in `tests/experimental`. When the header-only refactoring is complete these should be renamed or
split into a separate library.
Header-only API files and column-based API headers are stored together in `include/cuspatial`. The
former use the `.cuh` extension because they almost universally require CUDA compilation. The latter
use the `.hpp` extension because they can be compiled with a standard C++ compiler.

## File extensions

Expand Down
Original file line number Diff line number Diff line change
@@ -1,33 +1,34 @@
# cuSpatial C++ API Refactoring Guide
# cuSpatial C++ header-only API Guide

The original cuSpatial C++ API (libcuspatial) was designed to depend on RAPIDS libcudf and use
its core data types, especially `cudf::column`. For users who do not also use libcudf or other
RAPIDS APIS, depending on libcudf could be a big barrier to adoption of libcuspatial. libcudf is
its core data types, especially `cudf::column`. For users who do not also use libcudf or other
RAPIDS APIS, depending on libcudf could be a big barrier to adoption of libcuspatial. libcudf is
a very large library and building it takes a lot of time.

Therefore, we are developing a standalone libcuspatial C++ API that does not depend on libcudf. This
is a header-only template API with an iterator-based interface. This has a number of advantages
Therefore, the core of cuSpatial is now implemented in a standalone C++ API that does not depend on
libcudf. This is a header-only template API with an iterator- and range-based interface. This has a
number of advantages.

1. With a header-only API, users can include and build exactly what they use.
2. With a templated API, the API can be flexible to support a variety of basic data types, such
2. With a templated API, the API can be flexible to support a variety of basic data types, such
as float and double for positional data, and different integer sizes for indices.
3. By templating on iterator types, cuSpatial algorithms can be fused with transformations of the
input data, by using "fancy" iterators. Examples include transform iterators and counting
iterators.
4. Memory resources only need to be part of APIs that allocate temporary intermediate storage.
Output storage is allocated outside the API and an output iterator is passed as an argument.
Output storage is allocated outside the API and an output iterator is passed as an argument.

The main disadvantages of this type of API are

1. Header-only APIs can increase compilation time for code that depends on them.
2. Some users (especially our Python API) may prefer a cuDF-based API.
2. Some users (especially the cuSpatial Python API) may prefer a cuDF-based API.

The good news is that by maintaining the existing libcudf-based C++ API as a layer above the header-
only libcuspatial API, we can avoid problem 1 and problem 2 for users of the legacy API.
The good news is that maintaining the existing libcudf-based C++ API as a layer above the header-
only libcuspatial API avoids problem 1 and problem 2 for users of the column-based API.

## Example API

Following is an example iterator-based API for `cuspatial::haversine_distance`. (See below for
Following is an example iterator-based API for `cuspatial::haversine_distance`. (See below for
discussion of API documentation.)

```c++
Expand All @@ -47,7 +48,7 @@ OutputIt haversine_distance(LonLatItA a_lonlat_first,
There are a few key points to notice.
1. The API is very similar to STL algorithms such as `std::transform`.
2. All array inputs and outputs are iterator type templates.
2. All array inputs and outputs are iterator type templates.
3. Longitude/Latitude data is passed as array of structures, using the `cuspatial::vec_2d`
type (include/cuspatial/vec_2d.hpp). This is enforced using a `static_assert` in the function
body (discussed later).
Expand All @@ -60,7 +61,7 @@ There are a few key points to notice.
7. The size of the input and output ranges in the example API are equal, so the start and end of
only the A range is provided (`a_lonlat_first` and `a_lonlat_last`). This mirrors STL APIs.
8. This API returns an iterator to the element past the last element written to the output. This
is inspired by `std::transform`, even though as with `transform`, many uses of
is inspired by `std::transform`, even though as with `transform`, many uses of
`haversine_distance` will not need this returned iterator.
9. All APIs that run CUDA device code (including Thrust algorithms) or allocate memory take a CUDA
stream on which to execute the device code and allocate memory.
Expand Down Expand Up @@ -103,7 +104,7 @@ Following is the (Doxygen) documentation for the above `cuspatial::haversine_dis
* otherwise.
* @pre `b_lonlat_first` may equal `distance_first`, but the range `[b_lonlat_first, b_lonlat_last)`
* shall not overlap the range `[distance_first, distance_first + (b_lonlat_last - b_lonlat_last))
* otherwise.
* otherwise.
* @pre All iterators must have the same `Location` type, with the same underlying floating-point
* coordinate type (e.g. `cuspatial::vec_2d<float>`).
*
Expand All @@ -119,13 +120,13 @@ Key points:
2. All parameters and all template parameters are documented.
3. States the C++ standard iterator concepts that must be implemented, and that iterators must be
device-accessible.
4. Documents requirements as preconditions using `@pre`.
4. Documents requirements as preconditions using `@pre`.
5. Uses preconditions to explicitly document what input ranges are allowed to overlap.
6. Documents the units of any inputs or outputs that have them.
## cuSpatial libcudf-based C++ API (legacy API)
This is the existing API, unchanged by refactoring. Here is the existing
This is the existing API, unchanged by refactoring. Here is the existing
`cuspatial::haversine_distance`:
```c++
Expand All @@ -142,47 +143,44 @@ OutputIt haversine_distance(LonLatItA a_lonlat_first,
```

key points:
1. All input data are `cudf::column_view`. This is a type-erased container so determining the
1. All input data are `cudf::column_view`. This is a type-erased container so determining the
type of data must be done at run time.
2. All inputs are arrays of scalars. Longitude and latitude are separate.
2. All inputs are arrays of scalars. Longitude and latitude are separate.
3. The output is a returned `unique_ptr<cudf::column>`.
4. The output is allocated inside the function using the passed memory resource.
5. The public API does not take a stream. There is a `detail` version of the API that takes a
stream. This follows libcudf, and may change in the future.

## File Structure

For now, libcuspatial APIs should be defined in a header file in the
`cpp/include/cuspatial/experimental/` directory. Later, as we adopt the new API, we will rename
the `experimental` directory. The API header should be named after the API. In the example,
`haversine.hpp` defines the `cuspatial::haversine_distance` API.
libcuspatial APIs should be defined in a header file in the `cpp/include/cuspatial/` directory.
The API header should be named after the API. In the example, `haversine.hpp` defines the `cuspatial::haversine_distance` API.

The implementation must also be in a header, but should be in the `cuspatial/experimental/detail`
directory. The implementation should be included from the API definition file, at the end of the
file. Example:
The implementation must also be in a header, but should be in the `cuspatial/detail` directory. The
implementation should be included from the API definition file, at the end of the file. Example:

```c++
... // declaration of API above this point
#include <cuspatial/experimental/detail/haversine.hpp>
#include <cuspatial/detail/haversine.hpp>
```

## Namespaces

Public APIs are in the `cuspatial` namespace. Note that both the header-only API and the libcudf-
based API can live in the same namespace, because they are non-ambiguous (very different
based API can live in the same namespace, because they are non-ambiguous (very different
parameters).

Implementation of the header-only API should be in a `cuspatial::detail` namespace.

## Implementation

The main implementation should be in detail headers.
The main implementation should be in detail headers.

### Header-only API Implementation

Because it is a statically typed API, the header-only implementation can be much simpler than the
libcudf-based API, which requires run-time type dispatching. In the case of `haversine_distance`, it is
a simple matter of a few static asserts and dynamic expectation checks, followed by a call to
Because it is a statically typed API, the header-only implementation can be much simpler than the
libcudf-based API, which requires run-time type dispatching. In the case of `haversine_distance`,
it is a simple matter of a few static asserts and dynamic expectation checks, followed by a call to
`thrust::transform` with a custom transform functor.

```c++
Expand Down Expand Up @@ -216,17 +214,17 @@ OutputIt haversine_distance(LonLatItA a_lonlat_first,
```
Note that we `static_assert` that the types of the iterator inputs match documented expectations.
We also do a runtime check that the radius is positive. Finally we just call `thrust::transform`,
We also do a runtime check that the radius is positive. Finally we just call `thrust::transform`,
passing it an instance of `haversine_distance_functor`, which is a function of two `vec_2d<T>`
inputs that implements the Haversine distance formula.
### libcudf-based API Implementation
The substance of the refactoring is making the libcudf-based API a wrapper around the header-only
API. This mostly involves replacing business logic implementation in the type-dispatched functor
with a call to the header-only API. We also need to convert disjoint latitude and longitude inputs
The substance of the refactoring is making the libcudf-based API a wrapper around the header-only
API. This mostly involves replacing business logic implementation in the type-dispatched functor
with a call to the header-only API. We also need to convert disjoint latitude and longitude inputs
into `vec_2d<T>` structs. This is easily done using the `cuspatial::make_vec_2d_iterator` utility
provided in `type_utils.hpp`.
provided in `type_utils.hpp`.
So, to refactor the libcudf-based API, we remove the following code.
Expand Down Expand Up @@ -271,6 +269,6 @@ cuspatial::haversine_distance(lonlat_a,
Existing libcudf-based API tests can mostly be left alone. New tests should be added to exercise
the header-only API separately in case the libcudf-based API is removed.
Note that tests, like the header-only API, should not depend on libcudf or libcudf_test. The
Note that tests, like the header-only API, should not depend on libcudf or libcudf_test. The
cuDF-based API made the mistake of depending on libcudf_test, which results in breakages
of cuSpatial sometimes when libcudf_test changes.
33 changes: 18 additions & 15 deletions cpp/doxygen/developer_guide/TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,24 @@ and the main iterator and container types supported by algorithms. Here are som

## Header-only and Column-based API tests

libcuspatial currently has two C++ APIs: the column-based API uses libcudf data structures as
input and output. These tests live in `cpp/tests/` and can use libcudf features for constructing
columns and tables. The header-only API does not depend on libcudf at all and so tests of these
APIs should not include any libcudf headers. These tests currently live in `cpp/tests/experimental`.

Generally, we test algorithms and business logic in the header-only API's unit tests.
Column-based API tests should only cover specifics of the column-based API, such as type
handling, input validation, and exceptions that are only thrown by that API.
libcuspatial currently has two C++ APIs: the column-based API uses libcudf data structures as
input and output. These tests can use libcudf features for constructing columns and tables. The
header-only API does not depend on libcudf at all and so tests of these APIs should not include any
libcudf headers. Header-only and column-based API tests are located together in `cuspatial/tests`,
however header-only API tests are `.cu` files and column-based API files are `.cpp` files.

Generally, we test algorithms and business logic in the header-only API's unit tests.
Column-based API tests should only cover specifics of the column-based API, such as type
handling, input validation, and exceptions that are only thrown by that API. Column-based API tests
typically also tests empty imputs, to ensure that empty column inputs result in empty column output
rather than throwing exceptions.

## Directory and File Naming

The naming of unit test directories and source files should be consistent with the feature being
tested. For example, the tests for APIs in `point_in_polygon.hpp` should live in
tested. For example, the tests for APIs in `point_in_polygon.hpp` should live in
`cuspatial/cpp/tests/point_in_polygon_test.cpp`. Each feature (or set of related features) should
have its own test source file named `<feature>_test.cu/cpp`.
have its own test source file named `<feature>_test.cu/cpp`.

In the interest of improving compile time, whenever possible, test source files should be `.cpp`
files because `nvcc` is slower than `gcc` in compiling host code. Note that `thrust::device_vector`
Expand All @@ -52,7 +55,7 @@ Testing header-only APIs requires CUDA compilation so should be done in `.cu` fi

## Base Fixture

All libcuspatial unit tests should make use of a GTest
All libcuspatial unit tests should make use of a GTest
["Test Fixture"](https://github.com/google/googletest/blob/master/docs/primer.md#test-fixtures-using-the-same-data-configuration-for-multiple-tests-same-data-multiple-tests).
Even if the fixture is empty, it should inherit from the base fixture `cuspatial::test::BaseFixture`
found in `cpp/tests/base_fixture.hpp`. This ensures that RMM is properly initialized and
Expand All @@ -65,7 +68,7 @@ Example:

## Typed Tests

In general, libcuspatial features must work across all supported types (for cuspatial this
In general, libcuspatial features must work across all supported types (for cuspatial this
typically just means `float` and `double`). In order to automate the process of running
the same tests across multiple types, we use GTest's
[Typed Tests](https://github.com/google/googletest/blob/master/docs/advanced.md#typed-tests).
Expand All @@ -91,10 +94,10 @@ list defined in `TestTypes` (`float, double`).
## Utilities
libcuspatial test utilities include `cuspatial::test::expect_vector_equivalent()` in
`cpp/tests/utility/vector_equality()`. This function compares two containers using Google Test's
`cpp/tests/utility/vector_equality()`. This function compares two containers using Google Test's
approximate matching for floating-point values. It can handle vectors of `cuspatial::vec_2d<T>`,
where `T` is `float` or `double`. It automatically copies data in device containers to host
containers before comparing, so you can pass it one host and one device vector, for example.
where `T` is `float` or `double`. It automatically copies data in device containers to host
containers before comparing, so you can pass it one host and one device vector, for example.
Example:
Expand Down
Loading

0 comments on commit 5f91e68

Please sign in to comment.