Fully support lexicographic comparison for nested types in table self-comparator #12879

ttnghia · 2023-03-03T23:09:21Z

This PR implements full support for lexicographic comparison for nested types, for table self-comparator (support for two-table comparator will come in a follow-up PR).

Warning: Supporting structs-of-lists is a temporary workaround, not an optimal long-term solution.

Algorithm:

If a column contains structs-of-lists: flatten it using cudf::structs::detail::flatten_nested_columns. While waiting for a long-term optimal solution, this workaround is fairly simple and efficient.
If a column contains lists-of-structs: follow the algorithm described in Proposal for lexicographic comparators for lists of structs #11222.
If the input doesn't contain any lists-of-structs or structs-of-lists, there will not be any change/transformation applied to it, and the performance will not be impacted at all.

Depends on:

Closes:

Tested by spark-rapids integration tests. All tests pass. Data generator used to test:

struct_of_array_gen = [StructGen([['child0', ArrayGen(int_gen, max_length=10)]]),
                       StructGen([['child0', byte_gen],
                                  ['child1', ArrayGen(int_gen, max_length=10)],
                                  ['child2', int_gen]]),
                       StructGen([['child0', ArrayGen(int_gen, max_length=10)],
                                  ['child1', int_gen]]),
                       StructGen([['child0', ArrayGen(int_gen, max_length=10)],
                                  ['child1', ArrayGen(int_gen, max_length=10)],
                                  ['child2', ArrayGen(int_gen, max_length=10)]])]
nested_array_of_struct_gen = [ArrayGen(sub_gen, max_length=10) for sub_gen in struct_of_array_gen]
deeply_nested_array_gen = [ArrayGen(sub_gen, max_length=10) for sub_gen in nested_array_of_struct_gen] + \
                          [ArrayGen(sub_gen, max_length=10) for sub_gen in struct_of_array_gen]
deeply_nested_struct_gen = [StructGen([['child0', sub_gen] for ind, sub_gen in enumerate(deeply_nested_array_gen)])]

# This is used to test:
deeply_nested_gen = struct_of_array_gen + nested_array_of_struct_gen + deeply_nested_array_gen + \
                    deeply_nested_struct_gen

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

This reverts commit 3609edf.

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

ttnghia · 2023-03-13T20:16:21Z

I ran a benchmark comparing sorting lists columns (nested lists-of-lists, or LoL) and sorting lists-of-structs columns (LoS). These LoS columns are generated by just putting the children of the LoL columns under a structs column. So under the hood, these LoL and LoS columns contain the same information to compare.

The benchmarks show very interesting results:

# sort_list

## [0] Quadro RTX 6000

|  size_bytes  |  depth  |  num_columns  |  null_frequency  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |             Diff |   %Diff |  Status  |
|--------------|---------|---------------|------------------|------------|-------------|------------|-------------|------------------|---------|----------|
|     2^10     |    1    |      10       |        0         |   4.075 us |     204.71% |   2.184 us |     291.37% |        -1.891 us | -46.40% |   PASS   |
|     2^18     |    1    |      10       |        0         |  32.663 ms |       3.13% |  38.994 ms |       2.32% |         6.331 ms |  19.38% |   FAIL   |
|     2^24     |    1    |      10       |        0         |   9.545 ms |       9.76% |  29.418 ms |       5.41% |        19.873 ms | 208.21% |   FAIL   |
|     2^10     |    2    |      10       |        0         |   6.744 us |      38.68% |   2.906 us |     207.01% |        -3.838 us | -56.91% |   FAIL   |
|     2^18     |    2    |      10       |        0         | 901.082 ms |       4.00% |  28.292 ms |       3.16% |   -872790.516 us | -96.86% |   FAIL   |
|     2^24     |    2    |      10       |        0         | 113.982 ms |       0.86% | 115.617 ms |       0.99% |         1.635 ms |   1.43% |   FAIL   |
|     2^10     |    3    |      10       |        0         |   9.589 us |      59.38% |   3.461 us |      85.14% |        -6.128 us | -63.91% |   FAIL   |
|     2^18     |    3    |      10       |        0         |   9.805 us |      42.68% |   3.259 us |      96.03% |        -6.546 us | -66.76% |   FAIL   |
|     2^24     |    3    |      10       |        0         |   55.376 s |        inf% | 154.392 ms |       2.17% | -55221986.520 us | -99.72% |   FAIL   |
|     2^10     |    4    |      10       |        0         |  12.853 us |      42.00% |   4.005 us |     154.35% |        -8.848 us | -68.84% |   FAIL   |
|     2^18     |    4    |      10       |        0         |  12.584 us |      39.01% |   3.753 us |      78.64% |        -8.831 us | -70.18% |   FAIL   |
|     2^24     |    4    |      10       |        0         |  13.187 us |      91.99% |   3.787 us |      85.74% |        -9.400 us | -71.28% |   PASS   |

In summary:

For shallow LoS, sorting them is much slower than sorting LoL. This is due to the overhead of computing ranks.
However, when the LoL columns have higher depth (just more than 1 depth level) then sorting them becomes slower than sorting LoS due to the iterative mechanism of the comparator when working with multiple levels of lists.

I've filed the issue to leverage this in the future: #12932.

This reverts commit 1cf20a2.

ttnghia · 2023-03-13T21:48:58Z

Before we resort to flattening to support structs of lists, could we try to resolve #11672 properly? We originally expected structs of lists to "just work" tm, and the only reason I didn't investigate further at the time was the urgency imposed on getting the comparator merged. I would like to have a sense for the feasibility of the long-term solution before we resort to a workaround.

@vyasr I think I found what's wrong with structs-of-lists: For comparing lists, we need to generate dremel data. However:

Dremel data is generated only for lists columns that are not children of any other column:

cudf/cpp/src/table/row_operators.cu

Lines 265 to 270 in 9c9dd54

    
           for (auto const& col : table) { 
        
             if (col.type().id() == type_id::LIST) { 
        
               dremel_data.push_back(detail::get_comparator_data(col, {}, false, stream)); 
        
               dremel_device_views.push_back(dremel_data.back()); 
        
             } 
        
           }

On device, dremel data is considered for comparison only in the same situation:

cudf/cpp/include/cudf/table/experimental/row_operators.cuh

Lines 559 to 567 in 9c9dd54

    
           auto [l_dremel_i, r_dremel_i] = [&]() { 
        
             if (_lhs.column(i).type().id() == type_id::LIST) { 
        
               auto idx = list_column_index++; 
        
               return std::make_tuple(optional_dremel_view(_l_dremel[idx]), 
        
                                      optional_dremel_view(_r_dremel[idx])); 
        
             } else { 
        
               return std::make_tuple(optional_dremel_view{}, optional_dremel_view{}); 
        
             } 
        
           }();

Even that, structs column cannot have lists children. Otherwise, we will encounter compiler stack overflow with device code due to type dispatcher. The current code only processes child structs iteratively while ignoring lists (dispatch_void_if_nested):

cudf/cpp/include/cudf/table/experimental/row_operators.cuh

Lines 409 to 413 in 9c9dd54

    
           return cudf::type_dispatcher<dispatch_void_if_nested>( 
        
             lcol.type(), 
        
             element_comparator{_check_nulls, lcol, rcol, _null_precedence, depth, _comparator}, 
        
             lhs_element_index, 
        
             rhs_element_index);

So in order to fix this completely, we need both to generate+consider dremel data for nested lists columns as well as fixing the issue with type dispatcher above. I'm not sure about the complexity of such work, so it should be done in a separate follow up PR.

Update: I've successfully fixed the dremel issue above. However, that may require changing to the function decompose_structs which is also used in equality comparators. Such changes may affect comparison performance. Overall, too much risky for a limited time budget. So the issue needs to be addressed separately and I'll post a PR for it after this one merged whenever I can.

bdice

Submitting this partial review. I will review the rest at a later time.

bdice · 2023-03-13T18:11:19Z

cpp/include/cudf/table/experimental/row_operators.cuh

+   * @param flattened_input_aux_data The data structure generated from
+   * `cudf::structs::detail::flatten_nested_columns` that contains additional information used for
+   * row comparisons.
+   * @param transformed_structs_columns The intermediate columns resulted from transforming child


Maybe we can name this in a way that indicates what the transformation does. For example, list_of_structs_ranked_columns.

I rename it to structs_ranked_columns.

cpp/include/cudf/table/experimental/row_operators.cuh

bdice · 2023-03-13T18:14:46Z

cpp/include/cudf/table/experimental/row_operators.cuh

+
+  // Auxiliary data generated from `cudf::structs::detail::flatten_nested_columns`
+  // that needs to be kept alive.
+  std::unique_ptr<cudf::structs::detail::flattened_table> _flattened_input_aux_data;


Why does this need to be a unique_ptr when all the other owned members are not? Should this be an optional? Or just a plain member?

This is as discussed in #12878 (comment). In particular, using smart pointer allows to forward-declare the class flattened_table which in turn avoids including the relevant header into this already-huge header.

bdice · 2023-03-13T18:15:12Z

cpp/include/cudf/table/experimental/row_operators.cuh

+
+  // Auxiliary data generated from transforming lists-of-structs into lists-of-integer
+  // that needs to be kept alive.
+  std::vector<std::unique_ptr<column>> _transformed_structs_aux_data;


Similar here -- why does this need to be a vector of unique_ptr?

This is to store the output of cudf::rank thus must be unique_ptr.

bdice · 2023-03-13T18:25:33Z

cpp/src/table/row_operators.cu

+ *
+ * @param input The input column to transform
+ * @param stream CUDA stream used for device memory operations and kernel launches
+ * @return A pair of new column_view representing the transformed input and the generated rank


This should use make_lists_column and return the lists column directly, I think. I am guessing that the motivation to return this pair and not call make_lists_column is to avoid copying the input null mask and offsets, but that's not a great long term solution. For the "concat and rank" approach with two-table comparator, the mask/offsets cannot be reused and we will require a new list column to be constructed (the concatenation of the struct columns, which is then ranked and returned). I'd like have the self-comparator and two-table comparator act the same way, if possible, even though it may introduce an additional copy. The ranked lists of size_type are going to consume less memory than the input lists-of-structs, in most cases, so it's probably a memory cost we can afford to pay for a simplified implementation and easier lifetime management.

We actually don't need to call make_lists_column as that will copy the offsets column of the input lists. For the case of two-table comparator when we concate the child columns, we will split the output ranks into two ranks column and still apply the input null masks and offsets of the two input lists columns.

FYI, I've implemented the code for handling two-table comparator here: https://github.com/rapidsai/cudf/pull/12953/files#diff-c143ae2ecfe63796e604e5854f4a47cf6b831ebe5d68ce7c19a0e1533fceddc7R394-R413

cpp/src/table/row_operators.cu

bdice · 2023-03-13T18:37:46Z

cpp/src/table/row_operators.cu

+      // However, first ranks are computed faster and are good enough for ordering them. Structs
+      // compared equal always have consecutive rank values (in stable order) thus they are still
+      // sorted correctly by their ranks.
+      auto ranks             = cudf::detail::rank(child,


There are some details we need to pay attention to when performing the ranking. In particular, I want to be sure that NaN handling (via the physical element comparator) and nullability are handled correctly with respect to the target comparator. I'm not sure if this has complete/correct support for those yet?

This is what I was talking about in #11222:

Additionally, this places a new requirement on the self-comparator (two-table comparator) to know the nullability and physical element comparator (#10870) at construction so that the child ranks of list<struct<...>> data are correctly preprocessed. Currently these arguments are given when constructing the device row comparator from the self-comparator (two-table comparator).

See also:

cudf/cpp/include/cudf/table/experimental/row_operators.cuh

Lines 287 to 290 in 9313b4c

* @param null_precedence Optional, device array the same length as a row and indicates how null

* values compare to all other for every column. If `nullopt`, then null precedence would be

* `null_order::BEFORE` for all columns.

* @param comparator Physical element relational comparison functor.

This is really tricky. The rank API calls sort_order which in turn uses the less<> comparator with default template argument PhysicalElementComparator = sorting_physical_element_comparator. We cannot control that element comparator in this preprocessing step, as we have no way to pass in such element comparator into cudf::rank.

So the only way I can think of now is to enforce using that default element comparator later and nothing else (of course, only if cudf::rank has been called during preprocessing).

cpp/src/table/row_operators.cu

…ed_types

…ists-of-structs

divyegala · 2023-03-20T17:22:35Z

Update: I've successfully fixed the dremel issue above. However, that may require changing to the function decompose_structs which is also used in equality comparators. Such changes may affect comparison performance. Overall, too much risky for a limited time budget. So the issue needs to be addressed separately and I'll post a PR for it after this one merged whenever I can.

Can you create an issue documenting how to do this? Just to ensure we are tracking that there's possibly a better solution than flattening that we need to explore.

ttnghia · 2023-03-20T21:03:34Z

Can you create an issue documenting how to do this? Just to ensure we are tracking that there's possibly a better solution than flattening that we need to explore.

I'll update the original issue (#11672) with my finding.

ttnghia · 2023-04-04T17:57:34Z

Close this. I will separate it into two separate PRs for structs-of-lists and lists-of-structs.

ttnghia added 15 commits March 2, 2023 15:55

Add tests

ff1bc7e

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Complete tests

b02abae

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Disable unsupported conditions

3f6f2f3

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Reverse row_operator.cu

3609edf

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Revert "Reverse row_operator.cu"

7aede42

This reverts commit 3609edf.

Update tests

452f1b1

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Fix offset

ce8f088

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Change return type to unique_ptr

799f2ae

Adapt with changes

de7437e

Update copyright year

6cc4390

Add more variable

4092aae

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Implement flattening

bc9ecc4

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Merge branch 'refactor_flatten_columns' into sort_nested_types

c9aaa79

Complete implementation

284a79c

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Cleanup

c11b6e5

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress libcudf blocker libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Mar 3, 2023

ttnghia self-assigned this Mar 3, 2023

github-actions bot added the CMake CMake build issue label Mar 3, 2023

ttnghia changed the title ~~Fully support lexicographic comparison for nested types~~ Support lexicographic comparison for structs of lists Mar 4, 2023

ttnghia added 6 commits March 3, 2023 19:08

Fix tests

6d45f8f

Fix orders

a242dff

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Fix orders again

8e77216

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Cleanup

1238853

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Update tests

ec45e32

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

Add variable storing auxiliary data

71fc858

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

ttnghia added 2 commits March 13, 2023 13:17

Add benchmark code

55682f3

Revert "Add sort lists of structs benchmark"

d445652

This reverts commit 1cf20a2.

ttnghia mentioned this pull request Mar 13, 2023

[FEA] More optimization for lexicographic comparison of nested types #12932

Closed

Fix spell

4e36b15

ttnghia added 2 commits March 14, 2023 06:49

Merge branch 'branch-23.04' into sort_nested_types

dc9751b

Merge branch 'branch-23.04' into sort_nested_types

1d0d7e0

ttnghia mentioned this pull request Mar 15, 2023

Support lists of structs in row lexicographic comparator #12953

Merged

bdice reviewed Mar 16, 2023

View reviewed changes

ttnghia added 9 commits March 18, 2023 12:58

Add test

e96427f

Fix bug in using rank method

11f856f

Merge branch 'branch-23.04' into sort_nested_types

f6d70aa

Merge remote-tracking branch 'nghia/sort_nested_types' into sort_nest…

87e7156

…ed_types

Rename variables and rewrite docs

c6d1856

Only allow to use sorting_physical_element_comparator if there is l…

3003d7e

…ists-of-structs

Fix condition

7b21434

Remove noexcept

66e278a

Misc

032f271

Merge branch 'branch-23.04' into sort_nested_types

833d48b

GregoryKimball mentioned this pull request Mar 23, 2023

[FEA] Implement full support for nested types #11844

Closed

ttnghia marked this pull request as draft March 24, 2023 01:51

This was unlinked from issues Mar 24, 2023

Proposal for lexicographic comparators for lists of structs #11222

Closed

[FEA] Enable structs of lists in the new row lexicographic comparator #11672

Closed

ttnghia mentioned this pull request Mar 31, 2023

Proposal for lexicographic comparators for lists of structs #11222

Closed

ttnghia closed this Apr 4, 2023

ttnghia deleted the sort_nested_types branch May 1, 2023 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully support lexicographic comparison for nested types in table self-comparator #12879

Fully support lexicographic comparison for nested types in table self-comparator #12879

ttnghia commented Mar 3, 2023 •

edited

Loading

ttnghia commented Mar 13, 2023 •

edited

Loading

ttnghia commented Mar 13, 2023 •

edited

Loading

bdice left a comment

bdice Mar 13, 2023

ttnghia Mar 18, 2023 •

edited

Loading

bdice Mar 13, 2023

ttnghia Mar 16, 2023

bdice Mar 13, 2023

ttnghia Mar 16, 2023 •

edited

Loading

bdice Mar 13, 2023

ttnghia Mar 16, 2023 •

edited

Loading

bdice Mar 13, 2023

ttnghia Mar 16, 2023

divyegala commented Mar 20, 2023

ttnghia commented Mar 20, 2023

ttnghia commented Apr 4, 2023

	* @param null_precedence Optional, device array the same length as a row and indicates how null
	* values compare to all other for every column. If `nullopt`, then null precedence would be
	* `null_order::BEFORE` for all columns.
	* @param comparator Physical element relational comparison functor.

Fully support lexicographic comparison for nested types in table self-comparator #12879

Fully support lexicographic comparison for nested types in table self-comparator #12879

Conversation

ttnghia commented Mar 3, 2023 • edited Loading

ttnghia commented Mar 13, 2023 • edited Loading

ttnghia commented Mar 13, 2023 • edited Loading

bdice left a comment

Choose a reason for hiding this comment

bdice Mar 13, 2023

Choose a reason for hiding this comment

ttnghia Mar 18, 2023 • edited Loading

Choose a reason for hiding this comment

bdice Mar 13, 2023

Choose a reason for hiding this comment

ttnghia Mar 16, 2023

Choose a reason for hiding this comment

bdice Mar 13, 2023

Choose a reason for hiding this comment

ttnghia Mar 16, 2023 • edited Loading

Choose a reason for hiding this comment

bdice Mar 13, 2023

Choose a reason for hiding this comment

ttnghia Mar 16, 2023 • edited Loading

Choose a reason for hiding this comment

bdice Mar 13, 2023

Choose a reason for hiding this comment

ttnghia Mar 16, 2023

Choose a reason for hiding this comment

divyegala commented Mar 20, 2023

ttnghia commented Mar 20, 2023

ttnghia commented Apr 4, 2023

ttnghia commented Mar 3, 2023 •

edited

Loading

ttnghia commented Mar 13, 2023 •

edited

Loading

ttnghia commented Mar 13, 2023 •

edited

Loading

ttnghia Mar 18, 2023 •

edited

Loading

ttnghia Mar 16, 2023 •

edited

Loading

ttnghia Mar 16, 2023 •

edited

Loading