Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

really slow reduce over single dimension #362

Closed
jumerckx opened this issue Jul 1, 2019 · 5 comments
Closed

really slow reduce over single dimension #362

jumerckx opened this issue Jul 1, 2019 · 5 comments
Labels

Comments

@jumerckx
Copy link
Contributor

jumerckx commented Jul 1, 2019

reduce (as well as maximum, minimum but not sum...) is really slow on CuArrays when the reduction happens over a single dimension, a tuple of dimensions however, is a lot faster (multiple orders of magnitude).
When I set allowscalar to false , reduce, maximum and minium fail.

To Reproduce
The Minimal Working Example (MWE) for this bug:

using CuArrays
cuin = cu(rand(1000, 100, 100))

@time maximum(cuin, dims=2)
@time maximum(cuin, dims=(1,2))
# run twice ^

CuArrays.allowscalar(false)
maximum(cuin, dims=2)

On my machine, with dims=2 it takes 8.80s while it only takes 0.01s with dims=(2, 1)
After disallowing scalar indexing, the code above throws scalar getindex is disallowed

Environment details
Details on Julia:

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\jules\AppData\Local\atom\app-1.38.2\atom.exe"  -a
  JULIA_NUM_THREADS = 4```

@jumerckx jumerckx added the bug label Jul 1, 2019
@tanhevg
Copy link

tanhevg commented Jul 13, 2019

A quick workaround for this would be something like:

import Base.minimum
function minimum(x::CuArray{T}; dims=1:ndims(x)) where {T}
    mx = fill!(similar(x, T, Base.reduced_indices(x, dims)), typemax(T))
    Base._mapreducedim!(identity, min, mx, x)
end

This reduces the time from 1s to 0.002s for the MWE on my machine. This works, because _mapreducedim! is specialised in CuArrays.

The scalar getindex is hit by mapfoldl which is called by the complex initialisation workflow in Base.reducedim.jl. I guess fixing this properly would entail writing a kernel for mapflodl, however I am not 100% clear why this complex initialisation for min/max is required in the first place.

@maleadt
Copy link
Member

maleadt commented Aug 23, 2019

Yeah sure that would be a good workaround for now.

tanhevg pushed a commit to tanhevg/CuArrays.jl that referenced this issue Aug 27, 2019
tanhevg pushed a commit to tanhevg/CuArrays.jl that referenced this issue Aug 28, 2019
maleadt added a commit that referenced this issue Aug 29, 2019
#362: specialise reduce, minimum and maximum for CuArrays
@maleadt maleadt closed this as completed Aug 29, 2019
@baggepinnen
Copy link
Contributor

Any ideas on how to backpropagate through maximum, as findmax uses scalar getindex and thus is very slow? I tried stuff like

Base.findmax(x::CuArray{T}; dims=:) where {T} = reduce(x, init=(typemin(T),1), dims=dims) do ((a,ai),(b,bi))
    a > b ? (a,ai) : (b,bi)
end

without any luck

@tanhevg
Copy link

tanhevg commented Sep 3, 2019

@baggepinnen: have a look at argmax, but be wary of #304

@baggepinnen
Copy link
Contributor

Thanks! In my context I managed to get around the problem by creating a binary mask, but am looking forward to seeing progress on #304 in general.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants