Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor contains_table with cuco::static_set #14064

Merged
merged 28 commits into from
Sep 26, 2023
Merged
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8f2294f
Refactor contains_table with cuco::static_set
PointKernel Sep 7, 2023
f2fd994
Refactor contains_table with cuco::static_set
PointKernel Sep 7, 2023
6c85572
Merge remote-tracking branch 'upstream/branch-23.10' into cuco-contai…
PointKernel Sep 8, 2023
40dfead
Merge remote-tracking branch 'upstream/branch-23.10' into cuco-contai…
PointKernel Sep 12, 2023
bec3cd6
Fix logic issues with hashset
PointKernel Sep 13, 2023
5ada788
Get rid of build_row_bitmask function
PointKernel Sep 13, 2023
204bd45
Minor cleanups: renaming
PointKernel Sep 13, 2023
6ae6f2a
Merge branch 'branch-23.10' into cuco-contains-table
PointKernel Sep 13, 2023
9c7f4f6
Use build_row_bitmmask instead of bitmask_and
PointKernel Sep 14, 2023
237fd70
Code formatting
PointKernel Sep 14, 2023
7283dd3
Merge branch 'branch-23.10' into cuco-contains-table
PointKernel Sep 18, 2023
0a8b678
Add comments back
PointKernel Sep 18, 2023
93ba686
Merge remote-tracking branch 'origin/cuco-contains-table' into cuco-c…
PointKernel Sep 18, 2023
eca017f
Rename contains benchmark file as contains_scalar
PointKernel Sep 18, 2023
fc0980f
Add contains_table benchmark
PointKernel Sep 18, 2023
b2934e5
Add peak memory usage in contains_table benchmark
PointKernel Sep 19, 2023
218ab4f
Minor doc cleanups
PointKernel Sep 19, 2023
6fcaa46
Merge remote-tracking branch 'upstream/branch-23.10' into cuco-contai…
PointKernel Sep 20, 2023
9553104
Distinguish probing scheme CG sizes between nested and flat types for…
PointKernel Sep 20, 2023
0b67f9b
Merge branch 'branch-23.10' into cuco-contains-table
PointKernel Sep 20, 2023
71a8793
Merge branch 'branch-23.10' into cuco-contains-table
PointKernel Sep 21, 2023
cd75057
Merge remote-tracking branch 'upstream/branch-23.10' into cuco-contai…
PointKernel Sep 22, 2023
e1125c3
Remove redundant docs
PointKernel Sep 22, 2023
37f7048
Throw if needles and haystack column types mismatch
PointKernel Sep 22, 2023
4f6af5d
Simplify nested column handling
PointKernel Sep 22, 2023
de99c48
Merge branch 'cuco-contains-table' of github.com:PointKernel/cudf int…
PointKernel Sep 22, 2023
cb9614d
Merge branch 'branch-23.10' into cuco-contains-table
PointKernel Sep 22, 2023
b848ada
Merge branch 'branch-23.10' into cuco-contains-table
PointKernel Sep 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
311 changes: 153 additions & 158 deletions cpp/src/search/contains_table.cu
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

#include <thrust/iterator/counting_iterator.h>

#include <cuco/static_map.cuh>
#include <cuco/static_set.cuh>

#include <type_traits>

Expand All @@ -37,69 +37,59 @@ namespace {
using cudf::experimental::row::lhs_index_type;
using cudf::experimental::row::rhs_index_type;

using static_map = cuco::static_map<lhs_index_type,
size_type,
cuda::thread_scope_device,
rmm::mr::stream_allocator_adaptor<default_allocator<char>>>;

/**
* @brief Check if the given type `T` is a strong index type (i.e., `lhs_index_type` or
* `rhs_index_type`).
*
* @return A boolean value indicating if `T` is a strong index type
* @brief An hasher adapter wrapping both haystack hasher and needles hasher
*/
template <typename T>
constexpr auto is_strong_index_type()
{
return std::is_same_v<T, lhs_index_type> || std::is_same_v<T, rhs_index_type>;
}
template <typename HaystackHasher, typename NeedleHasher>
struct hasher_adapter {
hasher_adapter(HaystackHasher const& haystack_hasher, NeedleHasher const& needle_hasher)
: _haystack_hasher{haystack_hasher}, _needle_hasher{needle_hasher}
{
}

/**
* @brief An adapter functor to support strong index types for row hasher that must be operating on
* `cudf::size_type`.
*/
template <typename Hasher>
struct strong_index_hasher_adapter {
strong_index_hasher_adapter(Hasher const& hasher) : _hasher{hasher} {}
__device__ constexpr auto operator()(lhs_index_type idx) const noexcept
{
return _haystack_hasher(static_cast<size_type>(idx));
}

template <typename T, CUDF_ENABLE_IF(is_strong_index_type<T>())>
__device__ constexpr auto operator()(T const idx) const noexcept
__device__ constexpr auto operator()(rhs_index_type idx) const noexcept
{
return _hasher(static_cast<size_type>(idx));
return _needle_hasher(static_cast<size_type>(idx));
}

private:
Hasher const _hasher;
HaystackHasher const _haystack_hasher;
NeedleHasher const _needle_hasher;
};

/**
* @brief An adapter functor to support strong index type for table row comparator that must be
* operating on `cudf::size_type`.
* @brief An comparator adapter wrapping both self comparator and two table comparator
*/
template <typename Comparator>
struct strong_index_comparator_adapter {
strong_index_comparator_adapter(Comparator const& comparator) : _comparator{comparator} {}

template <typename T,
typename U,
CUDF_ENABLE_IF(is_strong_index_type<T>() && is_strong_index_type<U>())>
__device__ constexpr auto operator()(T const lhs_index, U const rhs_index) const noexcept
template <typename SelfEqual, typename TwoTableEqual>
struct comparator_adapter {
comparator_adapter(SelfEqual const& self_equal, TwoTableEqual const& two_table_equal)
: _self_equal{self_equal}, _two_table_equal{two_table_equal}
{
}

__device__ constexpr auto operator()(lhs_index_type lhs_index,
lhs_index_type rhs_index) const noexcept
{
auto const lhs = static_cast<size_type>(lhs_index);
auto const rhs = static_cast<size_type>(rhs_index);

if constexpr (std::is_same_v<T, U> || std::is_same_v<T, lhs_index_type>) {
return _comparator(lhs, rhs);
} else {
// Here we have T == rhs_index_type.
// This is when the indices are provided in wrong order for two table comparator, so we need
// to switch them back to the right order before calling the underlying comparator.
return _comparator(rhs, lhs);
}
return _self_equal(lhs, rhs);
}

__device__ constexpr auto operator()(lhs_index_type lhs_index,
rhs_index_type rhs_index) const noexcept
{
return _two_table_equal(lhs_index, rhs_index);
}

private:
Comparator const _comparator;
SelfEqual const _self_equal;
TwoTableEqual const _two_table_equal;
};

/**
Expand All @@ -112,6 +102,7 @@ struct strong_index_comparator_adapter {
* @param stream CUDA stream used for device memory operations and kernel launches
* @return A pair of pointer to the output bitmask and the buffer containing the bitmask
*/
/*
std::pair<rmm::device_buffer, bitmask_type const*> build_row_bitmask(table_view const& input,
rmm::cuda_stream_view stream)
{
Expand All @@ -132,24 +123,54 @@ std::pair<rmm::device_buffer, bitmask_type const*> build_row_bitmask(table_view

return std::pair(rmm::device_buffer{0, stream}, nullable_columns.front().null_mask());
}
// TODO: To doublecheck, under no situation but here we do nested checks with
// `get_nullable_columns`. This seems wrong
*/

/**
* @brief Invoke an `operator()` template with a row equality comparator based on the specified
* `compare_nans` parameter.
* @brief Invokes the given `func` with desired comparators based on the specified `compare_nans`
parameter.
*
* @tparam HaystackHasNested Flag indicating whether there are nested columns in haystack
* @tparam HasAnyNested Flag indicating whether there are nested columns in either haystack or
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
needles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @tparam HasAnyNested Flag indicating whether there are nested columns in either haystack or
needles
* @tparam HasAnyNested Flag indicating whether there are nested columns in either haystack or
needles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth the effort. I assume we shouldn't rely on any manual alignment for code formatting/linting.

Copy link
Contributor

@ttnghia ttnghia Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual alignment is for dev reading, to enhance aesthetics/readability, not for the generated docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of alignment, I think all lines should start with a single space and an asterisk.
https://github.com/rapidsai/cudf/blob/branch-23.10/cpp/doxygen/developer_guide/DOCUMENTATION.md#block-comments
I think this should apply to blank lines within the block comment as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all lines should start with a single space and an asterisk.

Oh, good catch. Definitely yes, something went wrong with my IDE formatting. Fixed

* @tparam Func Type of the helper function doing `contains` check

* @tparam haystack_has_nulls Flag indicating whether haystack has nulls or not
*
* @param compare_nans The flag to specify whether NaNs should be compared equal or not
* @param compare_nulls Control whether nulls should be compared as equal or not
* @param compare_nans Control whether floating-point NaNs values should be compared as equal or not
* @param haystack_has_nulls Flag indicating whether haystack has nulls or not
* @param has_any_nulls Flag indicating whether there are nested nulls is either haystack or needles
* @param self_equal Self table comparator
* @param two_table_equal Two table comparator
* @param func The input functor to invoke
*/
template <typename Func>
void dispatch_nan_comparator(nan_equality compare_nans, Func&& func)
template <bool HaystackHasNested, bool HasAnyNested, typename Func>
void dispatch_nan_comparator(
null_equality compare_nulls,
nan_equality compare_nans,
bool haystack_has_nulls,
bool has_any_nulls,
cudf::experimental::row::equality::self_comparator self_equal,
cudf::experimental::row::equality::two_table_comparator two_table_equal,
Func&& func)
{
if (compare_nans == nan_equality::ALL_EQUAL) {
using nan_equal_comparator =
cudf::experimental::row::equality::nan_equal_physical_equality_comparator;
func(nan_equal_comparator{});
auto const d_self_equal = self_equal.equal_to<HaystackHasNested>(
nullate::DYNAMIC{haystack_has_nulls}, compare_nulls, nan_equal_comparator{});
auto const d_two_table_equal = two_table_equal.equal_to<HasAnyNested>(
nullate::DYNAMIC{has_any_nulls}, compare_nulls, nan_equal_comparator{});
func(d_self_equal, d_two_table_equal);
} else {
using nan_unequal_comparator = cudf::experimental::row::equality::physical_equality_comparator;
func(nan_unequal_comparator{});
auto const d_self_equal = self_equal.equal_to<HaystackHasNested>(
nullate::DYNAMIC{haystack_has_nulls}, compare_nulls, nan_unequal_comparator{});
auto const d_two_table_equal = two_table_equal.equal_to<HasAnyNested>(
nullate::DYNAMIC{has_any_nulls}, compare_nulls, nan_unequal_comparator{});
func(d_self_equal, d_two_table_equal);
}
}

Expand All @@ -173,124 +194,98 @@ rmm::device_uvector<bool> contains(table_view const& haystack,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
auto map = static_map(compute_hash_table_size(haystack.num_rows()),
cuco::empty_key{lhs_index_type{std::numeric_limits<size_type>::max()}},
cuco::empty_value{detail::JoinNoneValue},
detail::hash_table_allocator_type{default_allocator<char>{}, stream},
stream.value());

auto const haystack_has_nulls = has_nested_nulls(haystack);
auto const needles_has_nulls = has_nested_nulls(needles);
auto const has_any_nulls = haystack_has_nulls || needles_has_nulls;

auto const preprocessed_needles =
cudf::experimental::row::equality::preprocessed_table::create(needles, stream);
auto const preprocessed_haystack =
cudf::experimental::row::equality::preprocessed_table::create(haystack, stream);
// Insert row indices of the haystack table as map keys.
{
auto const haystack_it = cudf::detail::make_counting_transform_iterator(
size_type{0},
[] __device__(auto const idx) { return cuco::make_pair(lhs_index_type{idx}, 0); });

auto const hasher = cudf::experimental::row::hash::row_hasher(preprocessed_haystack);
auto const d_hasher =
strong_index_hasher_adapter{hasher.device_hasher(nullate::DYNAMIC{has_any_nulls})};

auto const comparator =
cudf::experimental::row::equality::self_comparator(preprocessed_haystack);
auto const haystack_hasher = cudf::experimental::row::hash::row_hasher(preprocessed_haystack);
auto const d_haystack_hasher = haystack_hasher.device_hasher(nullate::DYNAMIC{has_any_nulls});
auto const needle_hasher = cudf::experimental::row::hash::row_hasher(preprocessed_needles);
auto const d_needle_hasher = needle_hasher.device_hasher(nullate::DYNAMIC{has_any_nulls});
auto const d_hasher = hasher_adapter{d_haystack_hasher, d_needle_hasher};
using hasher_type = decltype(d_hasher);

// If the haystack table has nulls but they are compared unequal, don't insert them.
// Otherwise, it was known to cause performance issue:
// - https://github.com/rapidsai/cudf/pull/6943
// - https://github.com/rapidsai/cudf/pull/8277
if (haystack_has_nulls && compare_nulls == null_equality::UNEQUAL) {
auto const bitmask_buffer_and_ptr = build_row_bitmask(haystack, stream);
auto const row_bitmask_ptr = bitmask_buffer_and_ptr.second;

auto const insert_map = [&](auto const value_comp) {
if (cudf::detail::has_nested_columns(haystack)) {
auto const d_eqcomp = strong_index_comparator_adapter{comparator.equal_to<true>(
nullate::DYNAMIC{haystack_has_nulls}, compare_nulls, value_comp)};
map.insert_if(haystack_it,
haystack_it + haystack.num_rows(),
thrust::counting_iterator<size_type>(0), // stencil
row_is_valid{row_bitmask_ptr},
d_hasher,
d_eqcomp,
stream.value());
} else {
auto const d_eqcomp = strong_index_comparator_adapter{comparator.equal_to<false>(
nullate::DYNAMIC{haystack_has_nulls}, compare_nulls, value_comp)};
map.insert_if(haystack_it,
haystack_it + haystack.num_rows(),
thrust::counting_iterator<size_type>(0), // stencil
row_is_valid{row_bitmask_ptr},
d_hasher,
d_eqcomp,
stream.value());
}
};

// Insert only rows that do not have any null at any level.
dispatch_nan_comparator(compare_nans, insert_map);
} else { // haystack_doesn't_have_nulls || compare_nulls == null_equality::EQUAL
auto const insert_map = [&](auto const value_comp) {
if (cudf::detail::has_nested_columns(haystack)) {
auto const d_eqcomp = strong_index_comparator_adapter{comparator.equal_to<true>(
nullate::DYNAMIC{haystack_has_nulls}, compare_nulls, value_comp)};
map.insert(
haystack_it, haystack_it + haystack.num_rows(), d_hasher, d_eqcomp, stream.value());
} else {
auto const d_eqcomp = strong_index_comparator_adapter{comparator.equal_to<false>(
nullate::DYNAMIC{haystack_has_nulls}, compare_nulls, value_comp)};
map.insert(
haystack_it, haystack_it + haystack.num_rows(), d_hasher, d_eqcomp, stream.value());
}
};

dispatch_nan_comparator(compare_nans, insert_map);
}
}
auto const self_equal = cudf::experimental::row::equality::self_comparator(preprocessed_haystack);
auto const two_table_equal = cudf::experimental::row::equality::two_table_comparator(
preprocessed_haystack, preprocessed_needles);

// The output vector.
auto contained = rmm::device_uvector<bool>(needles.num_rows(), stream, mr);

auto const preprocessed_needles =
cudf::experimental::row::equality::preprocessed_table::create(needles, stream);
// Check existence for each row of the needles table in the haystack table.
{
auto const needles_it = cudf::detail::make_counting_transform_iterator(
size_type{0}, [] __device__(auto const idx) { return rhs_index_type{idx}; });

auto const hasher = cudf::experimental::row::hash::row_hasher(preprocessed_needles);
auto const d_hasher =
strong_index_hasher_adapter{hasher.device_hasher(nullate::DYNAMIC{has_any_nulls})};

auto const comparator = cudf::experimental::row::equality::two_table_comparator(
preprocessed_haystack, preprocessed_needles);

auto const check_contains = [&](auto const value_comp) {
if (cudf::detail::has_nested_columns(haystack) or cudf::detail::has_nested_columns(needles)) {
auto const d_eqcomp =
comparator.equal_to<true>(nullate::DYNAMIC{has_any_nulls}, compare_nulls, value_comp);
map.contains(needles_it,
needles_it + needles.num_rows(),
contained.begin(),
d_hasher,
d_eqcomp,
stream.value());
} else {
auto const d_eqcomp =
comparator.equal_to<false>(nullate::DYNAMIC{has_any_nulls}, compare_nulls, value_comp);
map.contains(needles_it,
needles_it + needles.num_rows(),
contained.begin(),
d_hasher,
d_eqcomp,
stream.value());
}
};

dispatch_nan_comparator(compare_nans, check_contains);
auto const haystack_iter = cudf::detail::make_counting_transform_iterator(
size_type{0}, [] __device__(auto idx) { return lhs_index_type{idx}; });
auto const needles_iter = cudf::detail::make_counting_transform_iterator(
size_type{0}, [] __device__(auto idx) { return rhs_index_type{idx}; });

auto const helper_func = [&](auto const& d_self_equal, auto const& d_two_table_equal) {
auto const d_equal = comparator_adapter{d_self_equal, d_two_table_equal};

auto set = cuco::experimental::static_set{
cuco::experimental::extent{compute_hash_table_size(haystack.num_rows())},
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
cuco::empty_key{lhs_index_type{-1}},
d_equal,
cuco::experimental::linear_probing<1, hasher_type>{d_hasher},
detail::hash_table_allocator_type{default_allocator<lhs_index_type>{}, stream},
stream.value()};

if (haystack_has_nulls && compare_nulls == null_equality::UNEQUAL) {
auto const row_bitmask =
cudf::detail::bitmask_and(haystack, stream, rmm::mr::get_current_device_resource()).first;
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
set.insert_if_async(haystack_iter,
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
haystack_iter + haystack.num_rows(),
thrust::counting_iterator<size_type>(0), // stencil
row_is_valid{reinterpret_cast<bitmask_type const*>(row_bitmask.data())},
stream.value());
} else {
set.insert_async(haystack_iter, haystack_iter + haystack.num_rows(), stream.value());
}

if (needles_has_nulls && compare_nulls == null_equality::UNEQUAL) {
auto const row_bitmask =
cudf::detail::bitmask_and(needles, stream, rmm::mr::get_current_device_resource()).first;
set.contains_if_async(needles_iter,
needles_iter + needles.num_rows(),
thrust::counting_iterator<size_type>(0), // stencil
row_is_valid{reinterpret_cast<bitmask_type const*>(row_bitmask.data())},
contained.begin(),
stream.value());
} else {
set.contains_async(
needles_iter, needles_iter + needles.num_rows(), contained.begin(), stream.value());
}
};

if (cudf::detail::has_nested_columns(haystack)) {
dispatch_nan_comparator<true, true>(compare_nulls,
compare_nans,
haystack_has_nulls,
has_any_nulls,
self_equal,
two_table_equal,
helper_func);
} else {
if (cudf::detail::has_nested_columns(needles)) {
dispatch_nan_comparator<false, true>(compare_nulls,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case I'm interested in investigating in the other thread. Is the <false, true> dispatch providing special functionality or performance advantages that can't be achieved with 2 dispatches (<true, true> or <false, false>)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see your point.

Ideally, users should always check nested against nested. Having the <false, true> dispatch allows users to check nested needles against a flat haystack (though I'm not sure either if it's a legit use case), and the haystack set insertion would be faster than the <true, true> dispatch.

For my education, in this particular case, I assume <true, true> would work fine but less efficiently since we do redundant nested-related checks for a flat column. But would <false, false> produce proper results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But would <false, false> produce proper results?

Checked with @divyegala, this will cause runtime errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the <false, true> dispatch allows users to check nested needles against a flat haystack (though I'm not sure either if it's a legit use case)

Maybe I'm misunderstanding the problem -- but it seems like a nested needle could never be found in a haystack without nesting. Shouldn't that fail because the types of the needle and haystack don't match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that fail because the types of the needle and haystack don't match?

I'm not sure. The current contains_table allows this combination and returns 0 in this case. I'm trying to align with the existing behavior. Maybe @ttnghia knows more about what is the expected behavior here.

Copy link
Contributor

@ttnghia ttnghia Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the thing: when hashing the column with nulls, the validity (true/false) of the column will be hashed into the output hash:

if (_check_nulls) {
          auto validity_it = detail::make_validity_iterator<true>(curr_col);
          hash             = detail::accumulate(..., hash,...);

When using hash table (map/set), we must ensure that the hash values for rows in both tables are consistent. That means, if one side table has nulls and hashed with nulls, the other side must also be hashed with the same has_nulls. Otherwise, one table is hashed differently than the other and you will not find the correct "contains" output. That's why you see the hashers are constructed using has_any_nulls.

And that is only relevant to hashing.

For equality comparator only (without hashing), we can use different has_nulls for tables.

Copy link
Member Author

@PointKernel PointKernel Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a different issue. Null handling is properly propagated to both self and two-table comparators.

The focus was whether it's legit to have mismatched column types between haystack and needles. After checking the public contains API, I think the answer is clearer now: I've updated the code so now it will throw if column types mismatch. The implementation is therefore simplified as @bdice suggested. Nice catch, thanks! 🙏

compare_nans,
haystack_has_nulls,
has_any_nulls,
self_equal,
two_table_equal,
helper_func);
} else {
dispatch_nan_comparator<false, false>(compare_nulls,
compare_nans,
haystack_has_nulls,
has_any_nulls,
self_equal,
two_table_equal,
helper_func);
}
}

return contained;
Expand Down
Loading