Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lists of structs in row lexicographic comparator #12953

Merged
merged 182 commits into from
May 3, 2023
Merged
Show file tree
Hide file tree
Changes from 177 commits
Commits
Show all changes
182 commits
Select commit Hold shift + click to select a range
ff1bc7e
Add tests
ttnghia Mar 2, 2023
b02abae
Complete tests
ttnghia Mar 3, 2023
3f6f2f3
Disable unsupported conditions
ttnghia Mar 3, 2023
3609edf
Reverse `row_operator.cu`
ttnghia Mar 3, 2023
7aede42
Revert "Reverse `row_operator.cu`"
ttnghia Mar 3, 2023
452f1b1
Update tests
ttnghia Mar 3, 2023
ce8f088
Fix offset
ttnghia Mar 3, 2023
799f2ae
Change return type to `unique_ptr`
ttnghia Mar 3, 2023
de7437e
Adapt with changes
ttnghia Mar 3, 2023
6cc4390
Update copyright year
ttnghia Mar 3, 2023
4092aae
Add more variable
ttnghia Mar 3, 2023
bc9ecc4
Implement flattening
ttnghia Mar 3, 2023
c9aaa79
Merge branch 'refactor_flatten_columns' into sort_nested_types
ttnghia Mar 3, 2023
284a79c
Complete implementation
ttnghia Mar 3, 2023
c11b6e5
Cleanup
ttnghia Mar 3, 2023
6d45f8f
Fix tests
ttnghia Mar 4, 2023
a242dff
Fix orders
ttnghia Mar 4, 2023
8e77216
Fix orders again
ttnghia Mar 4, 2023
1238853
Cleanup
ttnghia Mar 5, 2023
ec45e32
Update tests
ttnghia Mar 5, 2023
71fc858
Add variable storing auxiliary data
ttnghia Mar 5, 2023
c8b0634
Support lists of structs
ttnghia Mar 5, 2023
4d35317
Fix copyright year
ttnghia Mar 5, 2023
c61beef
Fix null order
ttnghia Mar 5, 2023
5fe136d
Merge branch 'branch-23.04' into refactor_flatten_columns
ttnghia Mar 6, 2023
37913be
include cleanup for cudf/detail/structs/utilities.hpp
karthikeyann Mar 6, 2023
5ce6f6f
Cleanup
ttnghia Mar 6, 2023
32085c8
Merge branch 'branch-23.04' into refactor_flatten_columns
ttnghia Mar 6, 2023
fdd9b23
Fix comments
ttnghia Mar 6, 2023
35b87ba
Support arbitrary nested input
ttnghia Mar 6, 2023
5438336
Cleanup and add docs
ttnghia Mar 6, 2023
cafd1ff
Fix doxygen
ttnghia Mar 7, 2023
36699db
Merge branch 'branch-23.04' into refactor_flatten_columns
ttnghia Mar 7, 2023
01827e4
Fix comments
ttnghia Mar 7, 2023
5f8ae83
Add comment
ttnghia Mar 7, 2023
02b1619
Cleanup
ttnghia Mar 7, 2023
3d91de1
Add structs-of-lists tests
ttnghia Mar 7, 2023
660fa3e
Complete unit tests
ttnghia Mar 7, 2023
54ffd34
Update copyright year
ttnghia Mar 7, 2023
8c9bedf
Reformat
ttnghia Mar 7, 2023
96e28b8
Use first rank instead of dense rank
ttnghia Mar 8, 2023
4e0cf5b
Cleanup more header
ttnghia Mar 8, 2023
0a39d35
Merge branch 'refactor_flatten_columns' into sort_nested_types
ttnghia Mar 8, 2023
cc58c8a
Merge branch 'refactor_flatten_columns' into sort_nested_types
ttnghia Mar 8, 2023
2b5febf
Update doxygen
ttnghia Mar 8, 2023
7390c21
Merge branch 'branch-23.04' into sort_nested_types
ttnghia Mar 9, 2023
74b036b
Pass `stream` parameter to `flatten_nested_columns`
ttnghia Mar 9, 2023
7e874f6
Fix docs
ttnghia Mar 9, 2023
9313b4c
Fix relevant unit tests
ttnghia Mar 9, 2023
bd65cf5
Merge branch 'branch-23.04' into sort_nested_types
ttnghia Mar 13, 2023
1cf20a2
Add sort lists of structs benchmark
ttnghia Mar 13, 2023
55682f3
Add benchmark code
ttnghia Mar 13, 2023
d445652
Revert "Add sort lists of structs benchmark"
ttnghia Mar 13, 2023
4e36b15
Fix spell
ttnghia Mar 13, 2023
dc9751b
Merge branch 'branch-23.04' into sort_nested_types
ttnghia Mar 14, 2023
1d0d7e0
Merge branch 'branch-23.04' into sort_nested_types
ttnghia Mar 15, 2023
1f8ea8f
Draft implementation
ttnghia Mar 15, 2023
e6c5dd2
Fix compile errors
ttnghia Mar 15, 2023
98ec7d2
Fix optional access
ttnghia Mar 15, 2023
66cd1bc
Minor cleanup
ttnghia Mar 15, 2023
e96427f
Add test
ttnghia Mar 18, 2023
11f856f
Fix bug in using rank method
ttnghia Mar 18, 2023
f6d70aa
Merge branch 'branch-23.04' into sort_nested_types
ttnghia Mar 18, 2023
87e7156
Merge remote-tracking branch 'nghia/sort_nested_types' into sort_nest…
ttnghia Mar 18, 2023
c6d1856
Rename variables and rewrite docs
ttnghia Mar 18, 2023
3003d7e
Only allow to use `sorting_physical_element_comparator` if there is l…
ttnghia Mar 18, 2023
7b21434
Fix condition
ttnghia Mar 18, 2023
66e278a
Remove `noexcept`
ttnghia Mar 19, 2023
032f271
Misc
ttnghia Mar 19, 2023
2549a07
Merge branch 'sort_nested_types' into two_tables_nested_types
ttnghia Mar 20, 2023
2fa2b69
Update docs
ttnghia Mar 20, 2023
6a69fa7
Rename variables
ttnghia Mar 20, 2023
b291b9e
Add a helper function
ttnghia Mar 20, 2023
50f29cd
Add lists-of-structs test
ttnghia Mar 20, 2023
2dc2d15
Complete tests
ttnghia Mar 21, 2023
e818922
Add a test with equal structs
ttnghia Mar 21, 2023
84813e4
Merge branch 'branch-23.04' into two_tables_nested_types
ttnghia Mar 21, 2023
833d48b
Merge branch 'branch-23.04' into sort_nested_types
ttnghia Mar 21, 2023
3e9136f
Merge branch 'branch-23.04' into two_tables_nested_types
ttnghia Mar 21, 2023
e5dc067
Remove type alias
ttnghia Mar 21, 2023
b449c72
Merge branch 'sort_nested_types' into two_tables_nested_types
ttnghia Mar 21, 2023
0818370
Complete implementation
ttnghia Mar 23, 2023
1667b27
Fix existing test
ttnghia Mar 23, 2023
e2cc0ab
Fix test utilities
ttnghia Mar 23, 2023
64c6605
Add tests
ttnghia Mar 23, 2023
f75f9c5
Simplify code
ttnghia Mar 24, 2023
0c8a62a
Remove redundant code
ttnghia Mar 24, 2023
01acdaf
Revert all changes in `row_operators.cuh`
ttnghia Mar 24, 2023
9394256
Fix typo
ttnghia Mar 24, 2023
0537225
Update docs
ttnghia Mar 24, 2023
6ff8ede
Add comments
ttnghia Mar 24, 2023
b806d14
Change docs
ttnghia Mar 24, 2023
9f9031c
Merge branch 'branch-23.04' into sort_structs_of_lists
ttnghia Mar 24, 2023
7c63726
Fix style
ttnghia Mar 24, 2023
422f93c
Fix docs
ttnghia Mar 24, 2023
70abc3c
Fix depth comparision
ttnghia Mar 24, 2023
02073e1
Merge branch 'branch-23.04' into sort_structs_of_lists
ttnghia Mar 24, 2023
4b03070
Merge branch 'branch-23.06' into sort_structs_of_lists
ttnghia Mar 31, 2023
6baefea
Merge branch 'branch-23.06' into sort_structs_of_lists
ttnghia Apr 3, 2023
bcfbaca
Merge branch 'sort_nested_types' into sort_list_of_structs
ttnghia Apr 4, 2023
5ab1b0f
Add `type_id` to check for element comparator
ttnghia Apr 4, 2023
71585bc
Merge branch 'branch-23.06' into sort_structs_of_lists
vyasr Apr 7, 2023
104aeba
Add comment
ttnghia Apr 7, 2023
1b8369a
Add tests with sliced input
ttnghia Apr 7, 2023
654a232
Update cpp/src/table/row_operators.cu
ttnghia Apr 7, 2023
9d7921d
Optimize condition, and remove `UNKNOWN_NULL_COUNT`
ttnghia Apr 7, 2023
c7c69db
Merge branch 'sort_structs_of_lists' into sort_lists_of_structs
ttnghia Apr 7, 2023
cc44621
Remove stale implementation
ttnghia Apr 8, 2023
292985b
Remove type id from element comparator
ttnghia Apr 8, 2023
d6991f4
Merge branch 'sort_lists_of_structs' into two_tables_nested_types
ttnghia Apr 8, 2023
befa5d0
TMP
ttnghia Apr 8, 2023
1795dfc
Revert "TMP"
ttnghia Apr 8, 2023
4eabe1d
Stop decomposing column at lists column
ttnghia Apr 8, 2023
589ee83
Add test
ttnghia Apr 9, 2023
2ca024c
Enable check for structs of lists
ttnghia Apr 9, 2023
a0151c0
Reverse changes to utilities.cpp
ttnghia Apr 9, 2023
9f929b2
Fix `decompose_structs`
ttnghia Apr 9, 2023
3046c83
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 10, 2023
e25fbfc
Improve condition for `safe_for_two_table_comparator`
ttnghia Apr 10, 2023
e6e087e
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 11, 2023
4f87848
Rename parameter
ttnghia Apr 11, 2023
2cfd999
Rename variable and rewrite docs
ttnghia Apr 11, 2023
cdcc25a
Change error message
ttnghia Apr 11, 2023
77f98c5
Fix docs for `decompose_structs`
ttnghia Apr 11, 2023
ecaa6c5
Rewrite check conditions
ttnghia Apr 11, 2023
5a77828
Add sliced input test
ttnghia Apr 12, 2023
787a6b6
Implement handling for sliced input
ttnghia Apr 12, 2023
f464985
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 12, 2023
d728b38
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 13, 2023
0d7fcaf
Rename variables and rewrite docs
ttnghia Apr 13, 2023
6139dc2
Rewrite `transform_lists_of_structs`
ttnghia Apr 13, 2023
4d24c2e
Rename variables and rewrite docs
ttnghia Apr 13, 2023
c5a0a98
Rewrite `has_floating_point_in_struct` and its docs
ttnghia Apr 13, 2023
5240f92
Rewrite `lists_of_structs_have_floating_point`
ttnghia Apr 13, 2023
e9c57c7
Add check
ttnghia Apr 13, 2023
3bbb1ee
Fix docs
ttnghia Apr 13, 2023
ee92dba
Add column order to rank computation
ttnghia Apr 13, 2023
d0c04b9
Rewrite docs
ttnghia Apr 13, 2023
17473b4
Add tests with different null orders
ttnghia Apr 13, 2023
c7178b7
Add sort tests
ttnghia Apr 14, 2023
a07e0a2
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 14, 2023
b8d012d
Implement `preprocessed_id`
ttnghia Apr 14, 2023
984317e
Update test
ttnghia Apr 14, 2023
457f593
Fix error with tables having 0 row
ttnghia Apr 14, 2023
dce8a72
Add comment
ttnghia Apr 14, 2023
e27f9c1
MISC
ttnghia Apr 14, 2023
e1b8c69
Reverse verbosity level
ttnghia Apr 14, 2023
2e79bed
Fix comment
ttnghia Apr 14, 2023
9709738
Update docs
ttnghia Apr 14, 2023
8cf9b96
Other doc fixes
ttnghia Apr 14, 2023
a936480
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 14, 2023
971dd15
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 17, 2023
fb17cc1
Fix style
ttnghia Apr 19, 2023
6ab28fa
Change variable type
ttnghia Apr 19, 2023
bc955f0
Fix comments
ttnghia Apr 19, 2023
a3017a4
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 19, 2023
406db2a
Add more comment
ttnghia Apr 19, 2023
dc0fe20
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 21, 2023
3e8e530
Remove `preprocessed_id` and fix docs
ttnghia Apr 21, 2023
f82b3a7
Fix docs
ttnghia Apr 21, 2023
1aed40d
Change comments
ttnghia Apr 21, 2023
f0f4b89
Rename variable
ttnghia Apr 21, 2023
2f406a3
Fix tests
ttnghia Apr 21, 2023
e07a54e
Rename function
ttnghia Apr 21, 2023
1c37a17
Fix spell
ttnghia Apr 21, 2023
dd94820
Fix spell
ttnghia Apr 21, 2023
895e2c5
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 24, 2023
8a4f1af
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 25, 2023
3e4b7b2
Rename `*ranked_children` into `*has_ranked_children`
ttnghia Apr 25, 2023
eb73c75
Use enum `decompose_lists_column` instead of boolean value
ttnghia Apr 26, 2023
c086579
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 26, 2023
3a12c2d
Add comment
ttnghia Apr 26, 2023
3f38b82
Fix docs
ttnghia Apr 26, 2023
766127c
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 27, 2023
364a312
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia Apr 28, 2023
b254361
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia May 1, 2023
b0ee15a
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia May 2, 2023
a44d75a
Move `constexpr` order
ttnghia May 3, 2023
4fd2ab5
Rename fuction and extract `check_physical_element_comparator`
ttnghia May 3, 2023
f0b713d
Change unit tests
ttnghia May 3, 2023
ed89fc3
Add a complex unit test
ttnghia May 3, 2023
5122a80
Merge branch 'branch-23.06' into two_tables_nested_types
ttnghia May 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions cpp/benchmarks/sort/nested_types_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,23 @@

#include <random>

inline std::unique_ptr<cudf::table> create_lists_data(nvbench::state& state)
inline std::unique_ptr<cudf::table> create_lists_data(nvbench::state& state,
cudf::size_type const num_columns = 1,
cudf::size_type const min_val = 0,
cudf::size_type const max_val = 5)
{
const size_t size_bytes(state.get_int64("size_bytes"));
const cudf::size_type depth{static_cast<cudf::size_type>(state.get_int64("depth"))};
auto const null_frequency{state.get_float64("null_frequency")};

data_profile table_profile;
table_profile.set_distribution_params(cudf::type_id::LIST, distribution_id::UNIFORM, 0, 5);
table_profile.set_distribution_params(
cudf::type_id::LIST, distribution_id::UNIFORM, min_val, max_val);
table_profile.set_list_depth(depth);
table_profile.set_null_probability(null_frequency);
return create_random_table({cudf::type_id::LIST}, table_size_bytes{size_bytes}, table_profile);
return create_random_table(std::vector<cudf::type_id>(num_columns, cudf::type_id::LIST),
table_size_bytes{size_bytes},
table_profile);
}

inline std::unique_ptr<cudf::table> create_structs_data(nvbench::state& state,
Expand Down
70 changes: 67 additions & 3 deletions cpp/benchmarks/sort/sort_lists.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,82 @@

#include <nvbench/nvbench.cuh>

void nvbench_sort_lists(nvbench::state& state)
namespace {
cudf::size_type constexpr min_val = 0;
cudf::size_type constexpr max_val = 100;
ttnghia marked this conversation as resolved.
Show resolved Hide resolved

void sort_multiple_lists(nvbench::state& state)
{
auto const num_columns = static_cast<cudf::size_type>(state.get_int64("num_columns"));
auto const input_table = create_lists_data(state, num_columns, min_val, max_val);
auto const stream = cudf::get_default_stream();

state.set_cuda_stream(nvbench::make_cuda_stream_view(stream.value()));
state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
cudf::detail::sorted_order(
*input_table, {}, {}, stream, rmm::mr::get_current_device_resource());
});
}

void sort_lists_of_structs(nvbench::state& state)
{
auto const table = create_lists_data(state);
auto const num_columns = static_cast<cudf::size_type>(state.get_int64("num_columns"));
auto const lists_table = create_lists_data(state, num_columns, min_val, max_val);

// After having a table of (multiple) lists columns, convert those lists columns into lists of
// structs columns. The children of these structs columns are also children of the original lists
// columns.
// Such resulted lists-of-structs columns are very similar to the original lists-of-integers
// columns so their benchmarks can be somewhat comparable.
std::vector<cudf::column_view> lists_of_structs;
for (auto const& col : lists_table->view()) {
auto const child = col.child(cudf::lists_column_view::child_column_index);

// Put the child column under a struct column having the same null mask/null count.
auto const new_child = cudf::column_view{cudf::data_type{cudf::type_id::STRUCT},
child.size(),
nullptr,
child.null_mask(),
child.null_count(),
child.offset(),
{child}};
auto const converted_col =
cudf::column_view{cudf::data_type{cudf::type_id::LIST},
col.size(),
nullptr,
col.null_mask(),
col.null_count(),
col.offset(),
{col.child(cudf::lists_column_view::offsets_column_index), new_child}};
lists_of_structs.push_back(converted_col);
}

auto const input_table = cudf::table_view{lists_of_structs};
auto const stream = cudf::get_default_stream();

state.set_cuda_stream(nvbench::make_cuda_stream_view(stream.value()));
state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
rmm::cuda_stream_view stream_view{launch.get_stream()};
cudf::detail::sorted_order(*table, {}, {}, stream_view, rmm::mr::get_current_device_resource());
cudf::detail::sorted_order(input_table, {}, {}, stream, rmm::mr::get_current_device_resource());
});
}

} // namespace

void nvbench_sort_lists(nvbench::state& state)
{
const auto has_lists_of_structs = state.get_int64("lists_of_structs") > 0;
if (has_lists_of_structs) {
sort_lists_of_structs(state);
} else {
sort_multiple_lists(state);
}
}

NVBENCH_BENCH(nvbench_sort_lists)
.set_name("sort_list")
.add_int64_power_of_two_axis("size_bytes", {10, 18, 24, 28})
.add_int64_axis("depth", {1, 4})
.add_int64_axis("num_columns", {1})
.add_int64_axis("lists_of_structs", {0, 1})
.add_float64_axis("null_frequency", {0, 0.2});
17 changes: 16 additions & 1 deletion cpp/include/cudf/detail/sorting.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,6 +16,7 @@

#pragma once

#include <cudf/sorting.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/default_stream.hpp>

Expand Down Expand Up @@ -61,6 +62,20 @@ std::unique_ptr<table> sort_by_key(table_view const& values,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

/**
* @copydoc cudf::rank
*
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> rank(column_view const& input,
rank_method method,
order column_order,
null_policy null_handling,
null_order null_precedence,
bool percentage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we go ahead and make this an enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just exposing an API that already existed in a source file and changing this would affect other code paths I'm OK punting to a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just exposing the rank API in detail:: namespace. Changing this would be breaking so let's do it in some follow up PR.

rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

/**
* @copydoc cudf::stable_sort_by_key
*
Expand Down
Loading