Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for equi-join on struct #7720

Merged
merged 196 commits into from
Apr 1, 2021

Conversation

hyperbolic2346
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 commented Mar 25, 2021

Adds support for equijoin on structs.

This PR is leveraging the struct PR and the rewrite for join API. It enables equijoin on structs by flattening the struct for the hash calculation.

closes #7543

@hyperbolic2346 hyperbolic2346 added 3 - Ready for Review Ready for review by team feature request New feature or request and removed 2 - In Progress Currently a work in progress labels Mar 30, 2021
@hyperbolic2346
Copy link
Contributor Author

This is now ready for review.

@harrism
Copy link
Member

harrism commented Mar 30, 2021

This PR is leveraging struct PR and rewrite for join API.

Not sure what you mean by this. Please clarify with readers of the changelog in mind, since this will get pasted into it.

@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #7720 (2b0d7ee) into branch-0.19 (7871e7a) will increase coverage by 0.78%.
The diff coverage is n/a.

❗ Current head 2b0d7ee differs from pull request most recent head b946217. Consider uploading reports for the commit b946217 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7720      +/-   ##
===============================================
+ Coverage        81.86%   82.65%   +0.78%     
===============================================
  Files              101      103       +2     
  Lines            16884    17524     +640     
===============================================
+ Hits             13822    14484     +662     
+ Misses            3062     3040      -22     
Impacted Files Coverage Δ
python/cudf/cudf/utils/dtypes.py 83.27% <0.00%> (-6.24%) ⬇️
python/cudf/cudf/utils/gpu_utils.py 53.65% <0.00%> (-4.88%) ⬇️
python/cudf/cudf/core/column/lists.py 87.32% <0.00%> (-4.08%) ⬇️
python/dask_cudf/dask_cudf/backends.py 87.50% <0.00%> (-2.13%) ⬇️
python/cudf/cudf/core/abc.py 87.23% <0.00%> (-1.14%) ⬇️
python/cudf/cudf/core/column/decimal.py 93.84% <0.00%> (-1.03%) ⬇️
python/cudf/cudf/core/column/column.py 87.53% <0.00%> (-0.23%) ⬇️
python/cudf/cudf/utils/utils.py 85.36% <0.00%> (-0.07%) ⬇️
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/utils/ioutils.py 78.71% <0.00%> (ø)
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 684bb14...b946217. Read the comment docs.

@hyperbolic2346
Copy link
Contributor Author

Not sure what you mean by this. Please clarify with readers of the changelog in mind, since this will get pasted into it.

Am I correct in thinking that only the first line is used in the changelog? I have updated the comment to have a one-liner for what the change is on a high level and then detail for reviewers below that.

Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly simple. Does flatten_nested_columns throw an exception if it encounters a list column?

@hyperbolic2346
Copy link
Contributor Author

Surprisingly simple. Does flatten_nested_columns throw an exception if it encounters a list column?

It marches the table columns looking for structs. Doesn’t complain about lists.

if (child.type().id() == type_id::STRUCT)

@nvdbaranec
Copy link
Contributor

nvdbaranec commented Mar 30, 2021

Surprisingly simple. Does flatten_nested_columns throw an exception if it encounters a list column?

It marches the table columns looking for structs. Doesn’t complain about lists.

if (child.type().id() == type_id::STRUCT)

So if you had this:

struct<int, struct<list<struct<...>> , list, struct<int>>>>

it would flatten to

int, list, list, int ? That first list quietly contains a struct within it. Does that mess things up? Or is it the case that the mere presence of list columns in the flattened output will cause the join to throw later on?

cpp/src/join/join.cu Outdated Show resolved Hide resolved
@karthikeyann
Copy link
Contributor

@nvdbaranec I am working on templatized code for flatten_nested_columns. (still doesn't have list support, but throws error). Targeted for 0.20 release.

@hyperbolic2346
Copy link
Contributor Author

@nvdbaranec I just coded this up as a quick test and we get "C++ exception with description "cuDF failure at: ../../../../include/cudf/table/row_operators.cuh:363: Attempted to compare elements of uncomparable types." thrown. This is the same error as passing a list by itself to join.

@harrism
Copy link
Member

harrism commented Mar 30, 2021

Not sure what you mean by this. Please clarify with readers of the changelog in mind, since this will get pasted into it.

Am I correct in thinking that only the first line is used in the changelog? I have updated the comment to have a one-liner for what the change is on a high level and then detail for reviewers below that.

Nope, not correct. :)

Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a neat little PR.

@hyperbolic2346
Copy link
Contributor Author

@gpucibot merge

@hyperbolic2346 hyperbolic2346 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Apr 1, 2021
@hyperbolic2346
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f4ab813 into rapidsai:branch-0.19 Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add equi-join support for structs