Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast reductions #2869

Merged
merged 54 commits into from
Oct 23, 2021
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
b6ca433
add fast reduction for sum
bkamins Sep 9, 2021
b58ced8
first approach to summation with missings
bkamins Sep 11, 2021
1e5cb69
improve error message
bkamins Sep 11, 2021
748e609
implement safer reduction
bkamins Sep 12, 2021
d6339c0
add conversion
bkamins Sep 12, 2021
d1d6354
Update src/abstractdataframe/selection.jl
bkamins Sep 15, 2021
0d90feb
Update src/abstractdataframe/selection.jl
bkamins Sep 15, 2021
8f64633
Update src/abstractdataframe/selection.jl
bkamins Sep 15, 2021
97a98b9
Update src/abstractdataframe/selection.jl
bkamins Sep 15, 2021
5b88db8
Update src/abstractdataframe/selection.jl
bkamins Sep 15, 2021
adeb553
better handle sumz
bkamins Sep 15, 2021
59aef38
Merge branch 'bk/fast_sum' of https://github.com/JuliaData/DataFrames…
bkamins Sep 15, 2021
b972658
changes after code review
bkamins Sep 15, 2021
9d11641
fix typos
bkamins Sep 15, 2021
7924736
refactor code
bkamins Sep 15, 2021
98c48f2
improve implementation to make it use dispatch
bkamins Sep 15, 2021
54735f6
fix typo
bkamins Sep 15, 2021
f1b631c
add length and length with skipmissing
bkamins Sep 17, 2021
4c55191
add mean
bkamins Sep 18, 2021
7f0b152
split fast path to a separate file
bkamins Sep 18, 2021
7390702
add minium, maximum, min and max
bkamins Sep 18, 2021
77a3424
Apply suggestions from code review
bkamins Sep 20, 2021
0dcf6f5
use Base.add_sum + fix @noinline
bkamins Sep 20, 2021
9f56c00
Merge branch 'main' into bk/fast_sum
bkamins Sep 28, 2021
9b91f72
Merge branch 'bk/fast_sum' of https://github.com/JuliaData/DataFrames…
bkamins Sep 28, 2021
618832b
fix implementations
bkamins Sep 28, 2021
36450bc
finished design
bkamins Oct 16, 2021
903e57f
add tests of positional reductions
bkamins Oct 16, 2021
286fca0
added length tests
bkamins Oct 16, 2021
0473cd2
done sum testing
bkamins Oct 16, 2021
e06bc32
additional sum tests
bkamins Oct 16, 2021
a62b73f
finish mean et al. tests
bkamins Oct 16, 2021
e4536c6
add minimum and maximum tests
bkamins Oct 16, 2021
083a07f
Merge branch 'main' into bk/fast_sum
bkamins Oct 16, 2021
ba0208e
remove @show
bkamins Oct 16, 2021
a39aefd
update tests and docstring
bkamins Oct 17, 2021
1c0022b
fixes of x86 arch and Julia 1.0 problems
bkamins Oct 17, 2021
277bb24
fix 32-bit Julia issue
bkamins Oct 18, 2021
a766a92
fix more Julia 1.0.5 errors
bkamins Oct 18, 2021
a649925
Apply suggestions from code review
bkamins Oct 18, 2021
e53a7a2
improve docs
bkamins Oct 18, 2021
cc086d7
Merge branch 'bk/fast_sum' of https://github.com/JuliaData/DataFrames…
bkamins Oct 18, 2021
5680902
fix typo
bkamins Oct 18, 2021
cd5acdf
Fix code and add tests for Int32
bkamins Oct 18, 2021
54eed61
additional tests
bkamins Oct 19, 2021
4c46bca
Update src/abstractdataframe/selectionfast.jl
bkamins Oct 19, 2021
39790da
Update docs/src/lib/internals.md
bkamins Oct 19, 2021
0bfbc4a
Merge branch 'main' into bk/fast_sum
bkamins Oct 19, 2021
99d459e
update tests
bkamins Oct 19, 2021
05e6031
add NEWS.md
bkamins Oct 20, 2021
bb59dd7
Apply suggestions from code review
bkamins Oct 22, 2021
2fa0e01
0-length selection corner cases handling
bkamins Oct 22, 2021
8c83f36
Merge branch 'bk/fast_sum' of https://github.com/JuliaData/DataFrames…
bkamins Oct 22, 2021
07e47a1
fix Julia 1.0 and nightly
bkamins Oct 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/lib/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ repeat
repeat!
select
select!
table_transformation
transform
transform!
vcat
Expand Down
1 change: 1 addition & 0 deletions docs/src/lib/internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ getmaxwidths
ourshow
ourstrwidth
@spawn_for_chunks
default_table_transformation
```
13 changes: 11 additions & 2 deletions docs/src/man/split_apply_combine.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,20 @@ as following arguments. The second type of signature is when a `Function` or a `
is passed as the first argument and a `GroupedDataFrame` as the second argument
(similar to `map`).

As a special rule, with the `cols => function` and `cols => function =>
target_cols` syntaxes, if `cols` is wrapped in an `AsTable`
As a special rule, with the `cols => function` and
`cols => function => target_cols` syntaxes, if `cols` is wrapped in an `AsTable`
object then a `NamedTuple` containing columns selected by `cols` is passed to
`function`.

Note! When `AsTable` is used as source column selector it is possible to
override the default processing performed by function `function` by adding
a [`table_transformation`](@ref) method for this function. This is most useful
for custom reductions over columns of `NamedTuple` created by `AsTable`,
especially in cases when the user expects that very many (over 1000 as a rule of
thumb) would be selected by `AsTable` selector in which case avoiding creation
of `NamedTuple` object significantly reduces compilation time (which is often
more longer than computation time in such cases).

What is allowed for `function` to return is determined by the `target_cols` value:
1. If both `cols` and `target_cols` are omitted (so only a `function` is passed),
then returning a data frame, a matrix, a `NamedTuple`, or a `DataFrameRow` will
Expand Down
1 change: 1 addition & 0 deletions src/DataFrames.jl
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ include("groupeddataframe/utils.jl")
include("other/broadcasting.jl")

include("abstractdataframe/selection.jl")
include("abstractdataframe/selectionfast.jl")
include("abstractdataframe/subset.jl")
include("abstractdataframe/iteration.jl")
include("abstractdataframe/reshape.jl")
Expand Down
18 changes: 14 additions & 4 deletions src/abstractdataframe/selection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -400,11 +400,11 @@ _transformation_helper(df::AbstractDataFrame, col_idx::Int, (fun,)::Ref{Any}) =
_empty_astable_helper(fun, len) = [fun(NamedTuple()) for _ in 1:len]

function _transformation_helper(df::AbstractDataFrame, col_idx::AsTable, (fun,)::Ref{Any})
tbl = Tables.columntable(select(df, col_idx.cols, copycols=false))
if isempty(tbl) && fun isa ByRow
df_sel = select(df, col_idx.cols, copycols=false)
if ncol(df_sel) == 0 && fun isa ByRow
return _empty_astable_helper(fun.fun, nrow(df))
else
return fun(tbl)
return table_transformation(df_sel, fun)
end
end

Expand All @@ -415,7 +415,17 @@ function _transformation_helper(df::AbstractDataFrame, col_idx::AbstractVector{I
return _empty_selector_helper(fun.fun, nrow(df))
else
cdf = eachcol(df)
return fun(map(c -> cdf[c], col_idx)...)
cols = map(c -> cdf[c], col_idx)
if (fun === +) || fun === ByRow(+)
isempty(cols) && return +() # to make sure we produce a consistent error
return reduce(+, cols)
elseif fun === ByRow(min)
return _minmax_row_fast(cols, min)
elseif fun === ByRow(max)
return _minmax_row_fast(cols, max)
else
return fun(cols...)
end
end
end

Expand Down
Loading