Support `Transpose` and `Adjoint` in broadcast better #148

vchuravy · 2018-08-24T04:33:18Z

@maleadt as discussed earlier today, I didn't quite finish it so the CuArray changes are at https://github.com/JuliaGPU/CuArrays.jl/tree/vc/things

@SimonDanisch thoughts on the backend addition? The issue is that with wrapper types like Adjoint/Transpose/SubArray there is no easy dispatch target anymore so something trait-like might work quite well.

@andreasnoack would it make sense to introduce Adjoint{T, A} <: Wrapper{T, A<:AbstractArray{T}} <: AbstractArray{T} into the base hierarchy? Right now we need to enumerate and special treat all the wrapper types.

ssz66666 · 2018-08-24T06:12:24Z

I wonder if this approach can be adapted to allow arbitrarily nested wrapper types. Currently it doesn't:

julia> using CuArrays, Adapt

julia> CuArrays.allowscalar(false);

julia> xs = Adapt.adapt(CuArray{Complex{Float32}},[1.0+0.0im]);

julia> ys = transpose(xs)
1×1 LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}:
 1.0 + 0.0im

julia> ys .= 1
[ Info: getfield(GPUArrays, Symbol("##19#20"))(), (LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(identity),Tuple{Int64}})
1×1 LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}:
 1.0 + 0.0im


julia> zs = adjoint(ys)
1×1 LinearAlgebra.Adjoint{Complex{Float64},LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}}:
Error showing value of type LinearAlgebra.Adjoint{Complex{Float64},LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}}:
ERROR: scalar getindex is disabled

julia> zs .= 1
ERROR: scalar setindex! is disabled

vchuravy · 2018-08-24T17:09:48Z

I am not sure we can handle the recursive case without injecting an abstraction into the type-hierarchy. But everything gets fuzzy once you start thinking about transpose(DArray(CuArray)).

ssz66666 · 2018-08-25T02:44:40Z

src/mapreduce.jl

@@ -5,7 +5,10 @@

 Base.any(A::GPUArray{Bool}) = mapreduce(identity, |, false, A)
 Base.all(A::GPUArray{Bool}) = mapreduce(identity, &, true, A)
-Base.count(pred, A::GPUArray) = Int(mapreduce(pred, +, 0, A))
+
+Base.any(f::Function, A::GPUArray) = mapreduce(f, |, false, A)


Just noticed that Base.any and Base.all have defined slightly different semantics from the mapreduce-based implementation here when I was trying to incorporate the changes into #145 :

They are short-circuiting evaluations in Base. This changes semantics when the predicate has side-effects. Maybe we can just make it very clear in documentation that the GPU implementation is not short-circuiting?

Base implementations allow missing values, and deal with them by following three-valued logic. Should we follow the Base behaviour here? It might end up with something like:

Base.any(pred, A::GPUArray) = let t = begin mapreduce(x -> let v = pred(x) ismissing(v) ? (false, true) : (v && true, false) end, # enforce v is `Bool` or `missing` ((v1,m1),(v2,m2)) -> (v1 || v2, m1 || m2), A; init = (false, false)) end t[1] ? true : (t[2] ? missing : false) end

ssz66666 · 2018-08-25T19:40:06Z

I was thinking about marking wrapped arrays as GPU compatible by using another wrapper struct, so that we can get a dispatch target.

Essentially this is using a concrete wrapper type (called OnGPU in the example below) to pretend to be the proposed WrappedArray abstract type, and override all the methods defined in AbstractArray interface as well as hijacking the constructors of wrappers we want to override to rewire all methods to its child array. Thus we can kind of inject an abstraction into the type hierarchy without modifying base code. In addition I used phantom type in OnGPU to remember the GPUBackend type.

It will be necessary to re-implement all Base. methods that directly access its content, like what is done for GPUArray already. Then we might hijack methods that create wrappers (like LinearAlgebra.Transpose) over GPUArray/ wrapped GPU compatible array, to unwrap the marker, apply the wrapper, and then remark it as GPU compatible.

We will need to special case for all these wrappers we want to support vectorisation, but I think that is okay, because only very few wrappers, like SubArray or Transpose are really needed and actually easily GPU vectorisable without scalar indexing.

I've got a minimal example to demonstrate the idea. I'll really appreciate your opinion as I think it would be really useful to get some simple wrappers like SubArray, LinearAlgebra.Transpose or SkipMissing to work with GPUArrays.

abstract type MyInterface{T} end # like AbstractArray

abstract type MyGPUStruct{T} <: MyInterface{T} end # like GPUArray

struct MyStruct{T} <: MyGPUStruct{T} # like CuArray 
    name::String
    x::T
end

struct MyWrapper{T,AT <: MyInterface{T}} <: MyInterface{T} # like LinearAlgebra.Transpose
    parent::AT
end

struct MyWrapper2{T,AT <: MyInterface{T}} <: MyInterface{T} # any Wrapper that we want it to fall back to base
    parent2::AT
end

# methods in Base to create wrappers, like view, transpose and adjoint
wrap1(x::MyInterface) = MyWrapper(x)
wrap2(x::MyInterface) = MyWrapper2(x)

# all methods in `Base` to be overridden,
# like what is already done with GPUArray itself
function get_name(::MyInterface) 
    error("not implemented")
end

#think of get_name as broadcast, fill!, copyto! or show, etc
# we want to override it for OnGPU later

get_name(o::MyStruct) = begin println("MyStruct"); o.name end
get_name(o::MyWrapper) = begin println("Wrapper"); get_name(o.parent) end
get_name(o::MyWrapper2) = begin println("Wrapper2"); get_name(o.parent2) end

#methods that simply fall back to base
get_x(o::MyStruct) = o.x
get_x(o::MyWrapper) = get_x(o.parent)
get_x(o::MyWrapper) =  get_x(o.parent2)

# our marker wrapper that marks an MyInterface like thing
# to be GPU compatible. It only propagates to the next layer of
# wrapper if only the next layer of wrapping 
# is also GPU compatible 
# type param BE remembers the backend type to be able to
# dispatch to specific GPU kernels
# type param AT reveals the type of underlying array
# so that you can also dispatch on that
struct OnGPU{T,BE<:MyGPUStruct{T},AT<:MyInterface{T}} <: MyInterface{T}
    content::AT
end

# need to override methods that create wrappers we care about,
# e.g. Base.view, transpose, etc.
wrap1(x::BE) where {T,BE <: MyGPUStruct{T}} = begin
    y = MyWrapper(x)
    OnGPU{T,BE,typeof(y)}(y)
end
wrap1(x::OnGPU{T,BE,AT}) where {T,BE,AT} = begin
    y = MyWrapper(x.content)
    OnGPU{T,BE,typeof(y)}(y)
end

# capture vector operation methods like fill!
# for broadcast to work we need to fix 
# BroadcastStyle(::Type{OnGPU{T,<:MyGPUStruct,<:MyInterface}})
get_name(x::OnGPU) = get_name_gpu(x.content)

# methods that fall back to base
get_x(x::OnGPU) = get_x(x.content)

# GPU methods which are generic enough to be handled in
# GPUArrays rather than in backend
# in this case, simply unwrapping MyWrapper and pass it onto
# get_name_gpu(::MyGPUStruct)
get_name_gpu(x::MyWrapper) = get_name_gpu(x.parent)

# default behaviour should be falling back to base
get_name_gpu(x::MyGPUStruct) = error("fall back to base")

# hopefully implemented in GPU backend, don't know how difficult
# it would be to handle recursively nested wrappers that wrap
# a GPUArray(CuArray).
get_name_gpu(x::MyStruct) = get_name(x)

x = MyStruct("x",1) # uses gpu method
y = wrap1(x) # dispatch to gpu method
z = wrap1(y) # dispatch to gpu method
w = wrap2(z) # fall back to base for non-compatible wrappers 
v = wrap1(w) # fall back to base again

get_name(x)
get_name(y)
get_name(z)
get_name(w)
get_name(v)

maleadt · 2018-08-28T13:52:19Z

@ssz66666 Interesting design! But wouldn't it break use of Base/user code that expects wrappers? eg. foo(::MyWrapper) isn't callable with an OnGPU{..., ..., MyWrapper}. I'm not sure how common that is though, or for that matter how common nested wrappers are (necessitating this complexity).

For now, I've worked a little on this PR, at least improving the use of non-nested Transpose/Adjoint/SubArray. Let's merge this first and figure out a better wrapper design after that?

ssz66666 · 2018-08-28T14:36:09Z

Agreed with @maleadt , solving the whole wrapper support design problem seems out of scope of this PR.

Also, my proposal does break functions expecting a wrapper directly. An obvious use case is LinearAlgebra.transpose(::Transpose). Guess we need to find a way to expose this information to compiler without messing too much with the type hierarchy. Or perhaps just sticking to a simpler design and supporting the simplest yet common cases, at least for now.

vchuravy · 2018-08-29T21:08:53Z

When we tag we should bump the major version due to the change in API

vchuravy force-pushed the vc/trbc branch from 09be2b6 to af77697 Compare August 24, 2018 17:07

vchuravy force-pushed the vc/trbc branch from af77697 to ee75d66 Compare August 24, 2018 23:23

ssz66666 reviewed Aug 25, 2018

View reviewed changes

vchuravy and others added 7 commits August 28, 2018 10:18

handle broadcast of Transpose and Adjoint

c6ca41f

add tests for broadcasting adjoint and transpose

7f97371

introduce GPUDestArray to handle ArrayWrappers

f546631

add backend

cde392d

ok, ok these methods were needed

1518340

support for subarray, and clean-ups.

460ea24

Remove unused interface.

121fdcb

maleadt force-pushed the vc/trbc branch from ee75d66 to 121fdcb Compare August 28, 2018 08:54

maleadt mentioned this pull request Aug 28, 2018

Support Transpose and Adjoint in broadcast better JuliaGPU/CuArrays.jl#124

Merged

Re-enable tests.

aee445f

Simplify showing eval, and add SubArray.

5132df8

ssz66666 mentioned this pull request Aug 28, 2018

Base.any has different semantics from base implementation #154

Open

maleadt merged commit 6cec87e into master Aug 28, 2018

maleadt deleted the vc/trbc branch August 28, 2018 16:50

ssz66666 mentioned this pull request Aug 30, 2018

Broadcast does not work with wrappers (Adjoint, Transpose, SubArray) #147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `Transpose` and `Adjoint` in broadcast better #148

Support `Transpose` and `Adjoint` in broadcast better #148

vchuravy commented Aug 24, 2018

ssz66666 commented Aug 24, 2018

vchuravy commented Aug 24, 2018

ssz66666 Aug 25, 2018

ssz66666 commented Aug 25, 2018 •

edited

Loading

maleadt commented Aug 28, 2018

ssz66666 commented Aug 28, 2018 •

edited

Loading

vchuravy commented Aug 29, 2018

Support Transpose and Adjoint in broadcast better #148

Support Transpose and Adjoint in broadcast better #148

Conversation

vchuravy commented Aug 24, 2018

ssz66666 commented Aug 24, 2018

vchuravy commented Aug 24, 2018

ssz66666 Aug 25, 2018

Choose a reason for hiding this comment

ssz66666 commented Aug 25, 2018 • edited Loading

maleadt commented Aug 28, 2018

ssz66666 commented Aug 28, 2018 • edited Loading

vchuravy commented Aug 29, 2018

Support `Transpose` and `Adjoint` in broadcast better #148

Support `Transpose` and `Adjoint` in broadcast better #148

ssz66666 commented Aug 25, 2018 •

edited

Loading

ssz66666 commented Aug 28, 2018 •

edited

Loading