Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Commit

Permalink
Merge #467
Browse files Browse the repository at this point in the history
467: implement CuIterator for batching arrays to the GPU r=jrevels a=jrevels

I'm hitting what I believe is likely a common use case for CuArrays. My workload is essentially a map-reduce of the form:

```julia
some_reduction(f(batch::Tuple{Vararg{Array}})::Number for batch in batches)
```

Assuming `f` plays nicely with CuArrays, one way to get this running on a GPU is:

```julia
cubatches = (map(x -> adapt(CuArray, x), batch) for batch in batches)
some_reduction(f(cubatch) for cubatch in cubatches)
```

Unfortunately, this approach is a poor one when `batches` doesn't entirely fit in GPU memory. As the caller, I can assert that the old iterations' batches don't need to be kept around, so ideally, I'd have a mechanism to leverage to simply reuse old iterations' memory instead of allocating more. This PR implements such a mechanism: a `CuIterator` that maintains a memory pool and exploits the assumption that previous iterations' memory can be reused.

This is just a rough POC sketch right now just to express what I'm trying to get at...it's probably dumb in ways I can't yet comprehend. AFAICT I'd like to do the following before merging:

- [x] instead of the current shape/eltype-matching approach, just allocate to a device buffer (~`CUDAdrv.UnifiedBuffer`? I have no idea what I'm doing 😅 EDIT: oh is this how CUDAdrv exposes UVM or is that separate?~), increasing the buffer size as necessary once larger batches are encountered
- [x] ~leverage `CUDAdrv.prefetch` to asynchronously move values to GPU memory so as to not artificially block the iterator's consumer~ EDIT: I'm not sure this is necessary anymore assuming the `copyto!` call we're using is asynchronous?
- [x] make sure to free the pool once iteration is finished
- [x] docs
- [x] tests

In a future PR, we could add a feature for `CuIterator` to utilize UVM if the caller is in an environment where that's supported.

cc @vchuravy (thanks for discussing this with me earlier!)



Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com>
  • Loading branch information
bors[bot] and jrevels authored Mar 12, 2020
2 parents 15f6b54 + 259ed9f commit ae239d7
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/CuArrays.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ using CUDAapi, CUDAdrv, CUDAnative

using GPUArrays

export CuArray, CuVector, CuMatrix, CuVecOrMat, cu
export CuArray, CuVector, CuMatrix, CuVecOrMat, CuIterator, cu
export CUBLAS, CUSPARSE, CUSOLVER, CUFFT, CURAND, CUDNN, CUTENSOR

import LinearAlgebra
Expand Down Expand Up @@ -93,6 +93,7 @@ include("mapreduce.jl")
include("accumulate.jl")
include("linalg.jl")
include("nnlib.jl")
include("iterator.jl")

include("deprecated.jl")

Expand Down
27 changes: 27 additions & 0 deletions src/iterator.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""
CuIterator(batches)
Return a `CuIterator` that can iterate through the provided `batches` via `Base.iterate`.
Upon each iteration, the current `batch` is adapted to the GPU (via `map(x -> adapt(CuArray, x), batch)`)
and the previous iteration is marked as freeable from GPU memory (via `unsafe_free!`).
This abstraction is useful for batching data into GPU memory in a manner that
allows old iterations to potentially be freed (or marked as reusable) earlier
than they otherwise would via CuArray's internal polling mechanism.
"""
mutable struct CuIterator{B}
batches::B
previous::Any
CuIterator(batches) = new{typeof(batches)}(batches)
end

function Base.iterate(c::CuIterator, state...)
item = iterate(c.batches, state...)
isdefined(c, :previous) && foreach(unsafe_free!, c.previous)
item === nothing && return nothing
batch, next_state = item
cubatch = map(x -> adapt(CuArray, x), batch)
c.previous = cubatch
return cubatch, next_state
end
16 changes: 16 additions & 0 deletions test/iterator.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
@testset "CuIterator" begin
batch_count = 10
max_batch_items = 3
max_ndims = 3
sizes = 20:50
rand_shape = () -> rand(sizes, rand(1:max_ndims))
batches = [[rand(Float32, rand_shape()...) for _ in 1:rand(1:max_batch_items)] for _ in 1:batch_count]
cubatches = CuIterator(batch for batch in batches) # ensure generators are accepted
previous_cubatch = missing
for (batch, cubatch) in zip(batches, cubatches)
@test ismissing(previous_cubatch) || all(x -> x.freed, previous_cubatch)
@test batch == Array.(cubatch)
@test all(x -> x isa CuArray, cubatch)
previous_cubatch = cubatch
end
end
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ include("solver.jl")
include("sparse_solver.jl")
include("dnn.jl")
include("tensor.jl")
include("iterator.jl")

include("forwarddiff.jl")
include("nnlib.jl")
Expand Down

0 comments on commit ae239d7

Please sign in to comment.