Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] deprecate DataFrame constructors #2464

Merged
merged 21 commits into from
Nov 6, 2020
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@
## Deprecated

* `DataFrame!` is now deprecated ([#2338](https://github.com/JuliaData/DataFrames.jl/pull/2338))
* several in-standard `DataFrame` constructors are now deprecated
([#2464](https://github.com/JuliaData/DataFrames.jl/pull/2464))
* all old deprecations now throw an error
([#2350](https://github.com/JuliaData/DataFrames.jl/pull/2350))

Expand Down
156 changes: 66 additions & 90 deletions src/dataframe/dataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,11 @@ DataFrame(columns::AbstractVector, names::AbstractVector{Symbol};
makeunique::Bool=false, copycols::Bool=true)
DataFrame(columns::AbstractVector, names::AbstractVector{<:AbstractString};
makeunique::Bool=false, copycols::Bool=true)
DataFrame(columns::NTuple{N,AbstractVector}, names::NTuple{N,Symbol};
makeunique::Bool=false, copycols::Bool=true)
DataFrame(columns::NTuple{N,AbstractVector}, names::NTuple{N,<:AbstractString};
makeunique::Bool=false, copycols::Bool=true)
DataFrame(columns::Matrix, names::AbstractVector{Symbol}; makeunique::Bool=false)
DataFrame(columns::Matrix, names::AbstractVector{<:AbstractString};
makeunique::Bool=false)
DataFrame(kwargs...)
DataFrame(pairs::Pair{Symbol,<:Any}...; makeunique::Bool=false, copycols::Bool=true)
DataFrame(pairs::Pair{<:AbstractString,<:Any}...; makeunique::Bool=false,
copycols::Bool=true)
DataFrame(pairs::AbstractVector{<:Pair}; makeunique::Bool=false, copycols::Bool=true)
DataFrame(pairs::NTuple{N, Pair}; makeunique::Bool=false, copycols::Bool=true) where {N}
DataFrame() # an empty DataFrame
DataFrame(column_eltypes::AbstractVector, names::AbstractVector{Symbol},
nrows::Integer=0; makeunique::Bool=false)
Expand All @@ -37,7 +29,7 @@ DataFrame(::GroupedDataFrame; keepkeys::Bool=true)
```

# Arguments
- `columns` : a Vector with each column as contents or a Matrix
- `columns` : a vector with each column as contents
- `names` : the column names
- `makeunique` : if `false` (the default), an error will be raised
if duplicates in `names` are found; if `true`, duplicate names will be suffixed
Expand All @@ -46,26 +38,24 @@ DataFrame(::GroupedDataFrame; keepkeys::Bool=true)
column contents; note that the `copycols` keyword argument indicates if
if vectors passed as columns should be copied so it is not possible to create
a column whose name is `:copycols` using this constructor
- `t` : elemental type of all columns
- `nrows`, `ncols` : number of rows and columns
- `nrows` : number of rows
- `column_eltypes` : element type of each column
- `categorical` : a vector of `Bool` indicating which columns should be converted
to `CategoricalVector`
- `ds` : `AbstractDict` of columns
- `table` : any type that implements the
[Tables.jl](https://github.com/JuliaData/Tables.jl) interface
- `copycols` : whether vectors passed as columns should be copied; if set
to `false` then the constructor will still copy the passed columns
if it is not possible to construct a `DataFrame` without materializing new columns.
- `keepkeys` : if the resulting `DataFrame` should contain the grouping columns
of a `GroupedDataFrame`

All columns in `columns` must be `AbstractVector`s and have the same length. An
exception are `DataFrame(kwargs...)`, `DataFrame(pairs::Pair...)`,
`DataFrame(pairs::AbstractVector{<:Pair})`, and `DataFrame(pairs::NTuple{N, Pair})` form
constructors which additionally allow a column to be of any other type that is
not an `AbstractArray`, in which case the passed value is automatically repeated
to fill a new vector of the appropriate length. As a particular rule values
stored in a `Ref` or a `0`-dimensional `AbstractArray` are unwrapped and treated
in the same way.
exception are `DataFrame(kwargs...)`, `DataFrame(pairs::Pair...)`, and
`DataFrame(pairs::AbstractVector{<:Pair})` form constructors which additionally
allow a column to be of any other type that is not an `AbstractArray`, in which
case the passed value is automatically repeated to fill a new vector of the
appropriate length. As a particular rule values stored in a `Ref` or a
`0`-dimensional `AbstractArray` are unwrapped and treated in the same way.

Additionally `DataFrame` can be used to collect a [`GroupedDataFrame`](@ref)
into a `DataFrame`. In this case the order of rows in the result follows the order
Expand All @@ -85,28 +75,53 @@ performance-critical code, do not index into a `DataFrame` inside of loops.

# Examples
```julia
df = DataFrame()
v = ["x","y","z"][rand(1:3, 10)]
df1 = DataFrame(Any[collect(1:10), v, rand(10)], [:A, :B, :C])
df2 = DataFrame(A = 1:10, B = v, C = rand(10))
summary(df1)
describe(df2)
first(df1, 10)
df1.B
df2[!, :C]
df1[:, :A]
df1[1:4, 1:2]
df1[Not(1:4), Not(1:2)]
df1[1:2, [:A,:C]]
df1[1:2, r"[AC]"]
df1[:, [:A,:C]]
df1[:, [1,3]]
df1[1:4, :]
df1[1:4, :C]
df1[1:4, :C] = 40. * df1[1:4, :C]
[df1; df2] # vcat
[df1 df2] # hcat
size(df1)
julia> DataFrame(a=1:2, b=0)
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 2 │ 0 │

julia> DataFrame("a" => 1:2, "b" => 0)
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 2 │ 0 │

julia> DataFrame([[1, 2], [0, 0]], [:a, :b])
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 2 │ 0 │

julia> DataFrame((a=[1, 2], b=[0, 0]))
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 2 │ 0 │

julia> DataFrame([(a=1, b=0), (a=2, b=0)])
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 2 │ 0 │

julia> DataFrame(Tables.table([1 0; 2 0], header=[:a, :b]))
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 2 │ 0 │
```
"""
struct DataFrame <: AbstractDataFrame
Expand Down Expand Up @@ -182,14 +197,15 @@ function DataFrame(pairs::Pair{<:AbstractString,<:Any}...; makeunique::Bool=fals
copycols=copycols)
end

# these two are needed as a workaround Tables.jl dispatch
DataFrame(pairs::AbstractVector{<:Pair}; makeunique::Bool=false,
copycols::Bool=true) =
DataFrame(pairs..., makeunique=makeunique, copycols=copycols)

DataFrame(pairs::NTuple{N, Pair}; makeunique::Bool=false,
copycols::Bool=true) where {N} =
DataFrame(pairs..., makeunique=makeunique, copycols=copycols)
# this is needed as a workaround for Tables.jl dispatch
function DataFrame(pairs::AbstractVector{<:Pair}; makeunique::Bool=false,
copycols::Bool=true)
if isempty(pairs)
return DataFrame()
else
return DataFrame(pairs..., makeunique=makeunique, copycols=copycols)
end
end

function DataFrame(d::AbstractDict; copycols::Bool=true)
if isa(d, Dict)
Expand Down Expand Up @@ -251,44 +267,6 @@ DataFrame(columns::AbstractVector{<:AbstractVector},
makeunique::Bool=false, copycols::Bool=true) =
DataFrame(columns, Symbol.(cnames); makeunique=makeunique, copycols=copycols)

DataFrame(columns::NTuple{N, AbstractVector}, cnames::NTuple{N, Symbol};
makeunique::Bool=false, copycols::Bool=true) where {N} =
DataFrame(collect(AbstractVector, columns), collect(Symbol, cnames),
makeunique=makeunique, copycols=copycols)

DataFrame(columns::NTuple{N, AbstractVector}, cnames::NTuple{N, AbstractString};
makeunique::Bool=false, copycols::Bool=true) where {N} =
DataFrame(columns, Symbol.(cnames); makeunique=makeunique, copycols=copycols)

DataFrame(columns::NTuple{N, AbstractVector}; copycols::Bool=true) where {N} =
DataFrame(collect(AbstractVector, columns), gennames(length(columns)),
copycols=copycols)

DataFrame(columns::AbstractMatrix,
cnames::AbstractVector{Symbol} = gennames(size(columns, 2));
makeunique::Bool=false) =
DataFrame(AbstractVector[columns[:, i] for i in 1:size(columns, 2)], cnames,
makeunique=makeunique, copycols=false)

DataFrame(columns::AbstractMatrix, cnames::AbstractVector{<:AbstractString};
makeunique::Bool=false) =
DataFrame(columns, Symbol.(cnames); makeunique=makeunique)

function DataFrame(column_eltypes::AbstractVector{T}, cnames::AbstractVector{Symbol},
nrows::Integer=0; makeunique::Bool=false)::DataFrame where T<:Type
columns = AbstractVector[elty >: Missing ?
fill!(Tables.allocatecolumn(elty, nrows), missing) :
Tables.allocatecolumn(elty, nrows)
for elty in column_eltypes]
return DataFrame(columns, Index(convert(Vector{Symbol}, cnames),
makeunique=makeunique), copycols=false)
end

DataFrame(column_eltypes::AbstractVector{<:Type},
cnames::AbstractVector{<:AbstractString},
nrows::Integer=0; makeunique::Bool=false) =
DataFrame(column_eltypes, Symbol.(cnames), nrows; makeunique=makeunique)

##############################################################################
##
## AbstractDataFrame interface
Expand Down Expand Up @@ -1177,8 +1155,6 @@ function Base.append!(df1::DataFrame, df2::AbstractDataFrame; cols::Symbol=:sete
return df1
end

Base.convert(::Type{DataFrame}, A::AbstractMatrix) = DataFrame(A)

Base.convert(::Type{DataFrame}, d::AbstractDict) = DataFrame(d, copycols=false)

function Base.push!(df::DataFrame, row::Union{AbstractDict, NamedTuple};
Expand Down
40 changes: 40 additions & 0 deletions src/deprecated.jl
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,43 @@ function categorical!(df::DataFrame, cols::Union{Type, Nothing}=nothing;
end
return transform!(df, names(df, cols) .=> (x -> categorical(x, compress=compress)), renamecols=false)
end

@deprecate DataFrame(pairs::NTuple{N, Pair}; makeunique::Bool=false,
copycols::Bool=true) where {N} DataFrame(pairs..., makeunique=makeunique, copycols=copycols)
@deprecate DataFrame(columns::NTuple{N, AbstractVector}, cnames::NTuple{N, Symbol}; makeunique::Bool=false,
copycols::Bool=true) where {N} DataFrame(collect(columns), collect(cnames);
makeunique=makeunique, copycols=copycols)
@deprecate DataFrame(columns::NTuple{N, AbstractVector}, cnames::NTuple{N, AbstractString}; makeunique::Bool=false,
copycols::Bool=true) where {N} DataFrame(collect(columns), [Symbol(c) for c in cnames];
makeunique=makeunique, copycols=copycols)
@deprecate DataFrame(columns::NTuple{N, AbstractVector};
copycols::Bool=true) where {N} DataFrame(collect(columns),
Symbol.(:x, 1:length(columns)), copycols=copycols)

# this is not a 100% correct deprecation as it does not support makeunique, but
# we leave it as is to show users a recommended way to create a DataFrame from a matrix
@deprecate DataFrame(columns::AbstractMatrix, cnames::AbstractVector{Symbol} = gennames(size(columns, 2));
makeunique::Bool=false) DataFrame(Tables.table(columns, header=cnames))
@deprecate DataFrame(columns::AbstractMatrix, cnames::AbstractVector{<:AbstractString};
makeunique::Bool=false) DataFrame(Tables.table(columns, header=cnames))

function DataFrame(column_eltypes::AbstractVector{T}, cnames::AbstractVector{Symbol},
nrows::Integer=0; makeunique::Bool=false)::DataFrame where T<:Type
Base.depwarn("`DataFrame` constructor with passed eltypes is deprecated. " *
"Pass explicitly created columns to a `DataFrame` constructor instead.",
:DataFrame)
columns = AbstractVector[elty >: Missing ?
fill!(Tables.allocatecolumn(elty, nrows), missing) :
Tables.allocatecolumn(elty, nrows)
for elty in column_eltypes]
return DataFrame(columns, Index(convert(Vector{Symbol}, cnames),
makeunique=makeunique), copycols=false)
end

DataFrame(column_eltypes::AbstractVector{<:Type},
cnames::AbstractVector{<:AbstractString},
nrows::Integer=0; makeunique::Bool=false) =
DataFrame(column_eltypes, Symbol.(cnames), nrows; makeunique=makeunique)

import Base: convert
@deprecate convert(::Type{DataFrame}, A::AbstractMatrix) DataFrame(Tables.table(A, header=Symbol.(:x, axes(A, 2))))
10 changes: 6 additions & 4 deletions src/other/tables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,13 @@ fromcolumns(x, names; copycols::Bool=true) =
copycols=copycols)

function DataFrame(x::T; copycols::Bool=true) where {T}
if !Tables.istable(x)
if x isa AbstractVector && all(col -> isa(col, AbstractVector), x)
if !Tables.istable(x) && x isa AbstractVector && !isempty(x)
# here we handle the cases that are accepted by standard DataFrame constructors
# but eltype(x) is more flexible than assumed in these methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this comment. Do you mean that here we handle cases where eltype(x) doesn't match what other (more standard) constructors accept?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. What we handle is essentially Any[...] case (or some weird Union{...}[...]) as it will not be handled by standard constructors.

It was not very clear earlier and this is essentially what this code path is for only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think what confused me is "that are accepted by standard DataFrame constructors", since these are not really accepted, or they wouldn't have to be dealt with in this method. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you - as usual - propose a better wording (or what we have now is OK?). Thank you!

if all(col -> isa(col, AbstractVector), x)
return DataFrame(Vector{AbstractVector}(x), copycols=copycols)
elseif (x isa AbstractVector || x isa Tuple) &&
all(v -> v isa Pair{Symbol, <:AbstractVector}, x)
elseif all(v -> v isa Pair{Symbol, <:AbstractVector}, x) ||
all(v -> v isa Pair{<:AbstractString, <:AbstractVector}, x)
return DataFrame(AbstractVector[last(v) for v in x], [first(v) for v in x],
copycols=copycols)
end
Expand Down
Loading