Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster mapreduce for Broadcasted #31020

Merged
merged 12 commits into from
Apr 30, 2020
8 changes: 7 additions & 1 deletion base/broadcast.jl
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ BroadcastStyle(a::AbstractArrayStyle{M}, ::DefaultArrayStyle{N}) where {M,N} =
# methods that instead specialize on `BroadcastStyle`,
# copyto!(dest::AbstractArray, bc::Broadcasted{MyStyle})

struct Broadcasted{Style<:Union{Nothing,BroadcastStyle}, Axes, F, Args<:Tuple}
struct Broadcasted{Style<:Union{Nothing,BroadcastStyle}, Axes, F, Args<:Tuple} <: Base.AbstractBroadcasted
f::F
args::Args
axes::Axes # the axes of the resulting object (may be bigger than implied by `args` if this is nested inside a larger `Broadcasted`)
Expand Down Expand Up @@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc))
_eachindex(t::Tuple{Any}) = t[1]
_eachindex(t::Tuple) = CartesianIndices(t)

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))
Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis)
tkf marked this conversation as resolved.
Show resolved Hide resolved
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out my previous approach (combine IndexStyle of each argument) was wrong since it didn't work with, e.g., broadcasted(+, zeros(5), zeros(1, 4)). Each argument can be IndexLinear even though the broadcasted indexing is not (that's the main job of broadcasted).

So now the new approach is to return an equivalent of IndexStyle(bc.axes[1]) if length(bc.axes) == 1 and then return IndexCartesian() otherwise. As I need Axes type parameter now, this implementation requires the Broadcasted to be instantiated first. I guess this is OK since that's the usual broadcasting pipeline.

Is this implementation fine? Is IndexStyle(bc.axes[1]) == IndexLinear() && length(bc.axes) == 1 the only possible IndexLinear() case that is detectable in a type-stable manner? That is to say, I'm ignoring the case like broadcasted(+, zeros(1, 4), zeros(1, 4)) as I need to look at the value to check that it supports linear indexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really require the Broadcasted to be instantiated to get an IndexStyle — it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasteds should support cartesian indexing; only one-dimensional ones will allow straight integers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much more accurate way to put it. I wanted to say that "for the broadcasted to be dispatched to the optimized method", it has to be instantiated first.


Base.LinearIndices(bc::Broadcasted) = LinearIndices(eachindex(bc))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into how indexing is defined for Broadcasted (especially for how it interacts with Axes) and if this IndexStyle is correct.

However, it seems to me that it'd be much simpler if we define IndexStyle(::Broadcasted) than IndexStyle(::Type{<:Broadcasted}) as it would let us use eachindex(::Broadcasted) in IndexStyle. Does it makes sense to define IndexStyle(::Broadcasted) directly than the type trait?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK IndexStyle needs to take a type rather than an instance so that you can know statically what kind of indices to expect (that's more or less the definition of a trait). At least that's how its documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is based on the implementation detail in reduce.jl that IndexStyle is called on instance

_mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, IndexStyle(A), A)

which is valid since IndexStyle is callable on AbstractArray:

IndexStyle(A::AbstractArray) = IndexStyle(typeof(A))

However, to make IndexStyle a proper trait for Broadcasted, I defined methods for types and added the method for instance on top of them, just like how it's done for AbstractArray:

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))

Note that direct definition of IndexStyle(::Broadcasted) would be type-stable if eachindex is so. If eachindex is not type-stable we don't have the performance gain anyway. Initially, I wasn't aiming for adding public interface for Broadcasted in this PR so I thought relying on the implementation detail that IndexStyle is only called on instance was OK. However, I also understand that contaminating "signature space" of IndexStyle is not particularly a clean solution.

If we say that IndexStyle for Broadcasted has to be a proper trait, we need to note that IndexStyle must be properly defined whenever axes is customized since axes is a public interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but as long as you define IndexStyle it should be a proper trait, independent from how it's used internally. Otherwise, you'd better define a custom internal function.

Anyway, what's the problem with the current implementation in this PR? Just that's it's complicated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So, reading the documentation and the code more, it looks like Base.axes(::Broadcasted{MyStyle}) is not an overloadable method? From the documentation, Base.axes(x) is overloadable if x participate in the broadcasting. This method is then called via combine_axes in axes(::Broadcasted). I was initially worried that IndexStyle(::Type{<:Broadcasted}) and Base.axes(::Broadcasted) can become incompatible if downstream package authors are not careful but it looks like I don't need to worry about it.

tkf marked this conversation as resolved.
Show resolved Hide resolved

Base.ndims(::Broadcasted{<:Any,<:NTuple{N,Any}}) where {N} = N
Base.ndims(::Type{<:Broadcasted{<:Any,<:NTuple{N,Any}}}) where {N} = N

Expand Down
20 changes: 12 additions & 8 deletions base/reduce.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ else
const SmallUnsigned = Union{UInt8,UInt16,UInt32}
end

abstract type AbstractBroadcasted end
const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to define this AbstractBroadcasted as broadcast.jl is loaded later than reduce.jl. ATM it's just a hack to make this work, but I wonder if it makes sense to define

abstract type AbstractIndexable end
const Indexable = Union{AbstractArray, AbstractIndexable}

or even

abstract type Indexable{N} end
abstract type AbstractArray{T,N} <: Indexable{N} end

so that it's much easier for downstream projects to use indexing-based mapreduce implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an abstract type or a trait like Indexable would also be useful to support mapreduce over dimensions with skipmissing (#28027). It would avoid duplicating some methods.

Though a trait would probably be better than an abstract type, since it would be more flexible. In particular, an object returned by skipmissing could be marked as Indexable or not depending on whether it wraps an Indexable object (like an array) or another iterable. It would also allow objects that already inherit from a non-indexable abstract type to implement Indexable if they want, which wouldn't be possible with an abstract type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with trait direction, one option is to use IndexStyle. Something like this:

struct NonIndexable <: IndexStyle end
IndexStyle(::Type) = NonIndexable()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differing the whole designing work for the trait makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash mentioned there may be another way around this bootstrapping issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the 1.4 feature freeze is approaching, I support merging the PR with the Union for now, and try to use a more general trait later.

FWIW, this PR is giving us incredibly good performance to be able to reuse the pairwise summation algorithm with weights at JuliaStats/StatsBase.jl#518. Great work!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for testing this PR in a real-world scenario!


"""
Base.add_sum(x, y)

Expand Down Expand Up @@ -152,7 +155,8 @@ foldr(op, itr; kw...) = mapfoldr(identity, op, itr; kw...)

# This is a generic implementation of `mapreduce_impl()`,
# certain `op` (e.g. `min` and `max`) may have their own specialized versions.
@noinline function mapreduce_impl(f, op, A::AbstractArray, ifirst::Integer, ilast::Integer, blksize::Int)
@noinline function mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted,
ifirst::Integer, ilast::Integer, blksize::Int)
if ifirst == ilast
@inbounds a1 = A[ifirst]
return mapreduce_first(f, op, a1)
Expand All @@ -175,7 +179,7 @@ foldr(op, itr; kw...) = mapfoldr(identity, op, itr; kw...)
end
end

mapreduce_impl(f, op, A::AbstractArray, ifirst::Integer, ilast::Integer) =
mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted, ifirst::Integer, ilast::Integer) =
mapreduce_impl(f, op, A, ifirst, ilast, pairwise_blocksize(f, op))

"""
Expand Down Expand Up @@ -296,13 +300,13 @@ The default is `reduce_first(op, f(x))`.
"""
mapreduce_first(f, op, x) = reduce_first(op, f(x))

_mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, IndexStyle(A), A)
_mapreduce(f, op, A::AbstractArrayOrBroadcasted) = _mapreduce(f, op, IndexStyle(A), A)

function _mapreduce(f, op, ::IndexLinear, A::AbstractArray{T}) where T
function _mapreduce(f, op, ::IndexLinear, A::AbstractArrayOrBroadcasted)
inds = LinearIndices(A)
n = length(inds)
if n == 0
return mapreduce_empty(f, op, T)
return mapreduce_empty_iter(f, op, A, IteratorEltype(A))
elseif n == 1
@inbounds a1 = A[first(inds)]
return mapreduce_first(f, op, a1)
Expand All @@ -323,7 +327,7 @@ end

mapreduce(f, op, a::Number) = mapreduce_first(f, op, a)

_mapreduce(f, op, ::IndexCartesian, A::AbstractArray) = mapfoldl(f, op, A)
_mapreduce(f, op, ::IndexCartesian, A::AbstractArrayOrBroadcasted) = mapfoldl(f, op, A)

"""
reduce(op, itr; [init])
Expand Down Expand Up @@ -473,7 +477,7 @@ isgoodzero(::typeof(max), x) = isbadzero(min, x)
isgoodzero(::typeof(min), x) = isbadzero(max, x)

function mapreduce_impl(f, op::Union{typeof(max), typeof(min)},
A::AbstractArray, first::Int, last::Int)
A::AbstractArrayOrBroadcasted, first::Int, last::Int)
a1 = @inbounds A[first]
v1 = mapreduce_first(f, op, a1)
v2 = v3 = v4 = v1
Expand Down Expand Up @@ -746,7 +750,7 @@ function count(pred, itr)
end
return n
end
function count(pred, a::AbstractArray)
function count(pred, a::AbstractArrayOrBroadcasted)
n = 0
for i in eachindex(a)
@inbounds n += pred(a[i])::Bool
Expand Down
13 changes: 8 additions & 5 deletions base/reducedim.jl
Original file line number Diff line number Diff line change
Expand Up @@ -301,16 +301,19 @@ julia> mapreduce(isodd, |, a, dims=1)
1 1 1 1
```
"""
mapreduce(f, op, A::AbstractArray; dims=:, kw...) = _mapreduce_dim(f, op, kw.data, A, dims)
mapreduce(f, op, A::AbstractArrayOrBroadcasted; dims=:, kw...) =
_mapreduce_dim(f, op, kw.data, A, dims)

_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArray, ::Colon) = mapfoldl(f, op, A; nt...)
_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArrayOrBroadcasted, ::Colon) =
mapfoldl(f, op, A; nt...)

_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArray, ::Colon) = _mapreduce(f, op, IndexStyle(A), A)
_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArrayOrBroadcasted, ::Colon) =
_mapreduce(f, op, IndexStyle(A), A)

_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArray, dims) =
_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArrayOrBroadcasted, dims) =
mapreducedim!(f, op, reducedim_initarray(A, dims, nt.init), A)

_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArray, dims) =
_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArrayOrBroadcasted, dims) =
mapreducedim!(f, op, reducedim_init(f, op, A, dims), A)

"""
Expand Down
2 changes: 2 additions & 0 deletions test/broadcast.jl
Original file line number Diff line number Diff line change
Expand Up @@ -793,6 +793,7 @@ end
# Broadcasted iterable/indexable APIs
let
bc = Broadcast.instantiate(Broadcast.broadcasted(+, zeros(5), 5))
@test IndexStyle(bc) == IndexLinear()
@test eachindex(bc) === Base.OneTo(5)
@test length(bc) === 5
@test ndims(bc) === 1
Expand All @@ -803,6 +804,7 @@ let
@test ndims(copy(bc)) == ndims([v for v in bc]) == ndims(collect(bc)) == ndims(bc)

bc = Broadcast.instantiate(Broadcast.broadcasted(+, zeros(5), 5*ones(1, 4)))
@test IndexStyle(bc) == IndexCartesian()
@test eachindex(bc) === CartesianIndices((Base.OneTo(5), Base.OneTo(4)))
@test length(bc) === 20
@test ndims(bc) === 2
Expand Down