-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support threading in joined column creation #2664
Conversation
src/join/composer.jl
Outdated
dfr_noon = joiner.dfr[right_ixs, Not(joiner.right_on)] | ||
|
||
if VERSION >= v"1.4" && Threads.nthreads() > 1 && length(left_ixs) >= 1_000_000 | ||
dfl_task = Threads.@spawn joiner.dfl[left_ixs, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since getindex
already uses threading this is only to ensure we are fast in case left and right table have very different processing times.
@@ -207,21 +215,35 @@ function compose_joined_table(joiner::DataFrameJoiner, kind::Symbol, makeunique: | |||
|
|||
@assert col_idx == ncol(joiner.dfl_on) + 1 | |||
|
|||
for col in eachcol(dfl_noon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not add threading for on-columns creation as in practice I expect that on-columns are minority of created columns and I avoid adding too much boilerplate code this way.
Example run. Setup:
Execution on
Execution on this PR:
(using 8 threads in both cases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I just wish we could find a way to avoid putting if
checks everywhere in the long term.
src/join/composer.jl
Outdated
end | ||
@assert col_idx == ncol(joiner.dfl) + 1 | ||
for col in eachcol(dfr_noon) | ||
let cols_i = col_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use let
here but not above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - let
is not needed, as just writing cols_i = col_idx
will ensure we get a new binding for cols_i
. I will change this.
src/join/composer.jl
Outdated
function _noon_compose_helper(cols, similar_col, cols_i, col, target_nrow, | ||
side_ixs, offset, sideonly_ixs, tocopy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add types to the signature? Maybe also add a !
at the end of the function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - fixed
When we stop supporting Julia 1.0 it will be possible to significantly simplify the codes. |
Thank you! |
A minor PR to speed up column creation in joins using threading.
Its major benefit is for left, right and outerjoins having many extra columns (and producing over 1_000_000 rows)