Can we get rid of auto-broadcasting of 0D arrays for activations? #608

avik-pal · 2024-09-30T19:47:19Z

(Ideally I don't think it should be auto-broadcasting in the first place). But if we just get rid of O-D array broadcasting that solves our problem over at EnzymeAD/Reactant.jl#54

NNlib.jl/src/activations.jl

Lines 752 to 755 in ba29c90

    
           # Define broadcasts for activation functions on arrays 
        
           for f in ACTIVATIONS 
        
             @eval $(f)(x::AbstractArray, args...) = $(f).(x, args...) 
        
           end

Essentially, Reactant needs to treat scalars as OD trackedrarrays but that causes a recursion loop and expectedly the IR has an unreachable (EnzymeAD/Reactant.jl#54 (comment)). This means the only way we can support NNlib activations is to manually copy over all the code for activation functions.

Now I know there isn't a general way to "opt-out" of the broadcasting for 0-D arrays but we can just define the broadcasting for N=1..10 and hope no one is using an 11+D tensor.

mcabbott · 2024-09-30T21:08:01Z

I don't like the auto-broadcast either but here we are.

The built-in opt-out is this function -- which perhaps Reactant needs to know how to handle anyway?

julia> Base.Broadcast.broadcast_preserving_zero_d(sin, fill(pi/2))
0-dimensional Array{Float64, 0}:
1.0

julia> Base.Broadcast.broadcast_preserving_zero_d(sin, [0, pi/2])
2-element Vector{Float64}:
 0.0
 1.0

avik-pal · 2024-09-30T21:20:17Z

This will still cause issues, right? I want the OD case to be forwarded to the original call without any broadcasting. For example, for relu I want relu(x) = max(x, 0) to be called instead of relu(x::AbstractArray) = relu.(x)

mcabbott · 2024-09-30T21:26:42Z

But what this function is fed is some special fake 0D array which Reactant invents? My hope is that it can also be made to understand that broadcast_preserving_zero_d(sin, x::ZeroDimTrackedArray):: ZeroDimTrackedArray , but short-circuiting the present implementation used for Array{T,0}.

avik-pal · 2024-09-30T21:45:23Z

I found a solution that would do it:

for nnlib_op in setdiff(Tuple(NNlib.ACTIVATIONS), (:tanh_fast, :sigmoid_fast, :sigmoid, :σ))
    @eval function NNlib.$(nnlib_op)(x::TracedRArray{T,0}) where {T}
        return invoke(NNlib.$(nnlib_op), Tuple{Any}, x)
    end
end

what this function is fed is some special fake 0D array which Reactant invents?

correct. Reactant doesn't have a "Number" type, so we treat 0D arrays as a scalar

avik-pal changed the title ~~Can we get rid of forwarding broadcasting of 0D arrays for activations?~~ Can we get rid of auto-broadcasting of 0D arrays for activations? Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we get rid of auto-broadcasting of 0D arrays for activations? #608

Can we get rid of auto-broadcasting of 0D arrays for activations? #608

avik-pal commented Sep 30, 2024

mcabbott commented Sep 30, 2024

avik-pal commented Sep 30, 2024

mcabbott commented Sep 30, 2024 •

edited

Loading

avik-pal commented Sep 30, 2024

Can we get rid of auto-broadcasting of 0D arrays for activations? #608

Can we get rid of auto-broadcasting of 0D arrays for activations? #608

Comments

avik-pal commented Sep 30, 2024

mcabbott commented Sep 30, 2024

avik-pal commented Sep 30, 2024

mcabbott commented Sep 30, 2024 • edited Loading

avik-pal commented Sep 30, 2024

mcabbott commented Sep 30, 2024 •

edited

Loading