Add support for pinned memory. #133

maleadt · 2019-03-25T20:47:53Z

Fix #122

Based on #123, trying to improve the API making it more idiomatic while protecting against erroneous conversions.

WIP, but let me know what you think @wsphillips. Also cc @vchuravy.

maleadt · 2019-03-25T21:18:29Z

Stuff I'm unhappy with:

Mem.alloc(Mem.DeviceBuffer, ...) too verbose
Mem.upload(..., stream; async) stream only makes sense with async=true, but having both upload and eg. async_upload isn't nice either

vchuravy · 2019-03-25T21:44:14Z

CC: @lcw who has expressed interest in this.

maleadt · 2019-03-26T16:02:02Z

Would it be sane to make the unsafe_copyto!(::HostBuffer) (aka. pinned memory) async by default, whereas copies to and from other memories would be synchronizing? Or do we still want a async flag?

maleadt · 2019-03-26T16:06:24Z

Fixes across the stack (including a clean-up of sine GPUArrays code):
https://github.com/JuliaGPU/CUDAnative.jl/compare/tb/pinned
https://github.com/JuliaGPU/GPUArrays.jl/compare/tb/unsafe_copyto
https://github.com/JuliaGPU/CuArrays.jl/compare/tb/pinned

Is anybody using these CUDAdrv.Mem APIs (except for the above packages which I intend to fix)? Otherwise this is another breaking release.

wsphillips · 2019-03-26T18:25:53Z

Edit: Nevermind. Just saw the changes across the stack. This will take a minute to go through and see how it all is interacting across packages.

I am not at a workstation with a GPU right now I will checkout the changes and try testing it Thursday earliest.

maleadt · 2019-03-27T06:46:05Z

I'll be sure to update the CUDAdrv tests and add an example here to illustrate the envisioned usage.

maleadt · 2019-03-27T07:56:35Z

This is how it currently looks:

using CUDAdrv, CuArrays

dims = (5,)
T = Int
bytes = prod(dims)*sizeof(T)


# pinned memory

buf = Mem.alloc(Mem.Host, bytes)

cpu_ptr = convert(Ptr{T}, buf)
cpu_obj = unsafe_wrap(Array, cpu_ptr, dims)

gpu_obj = CuArray{T}(undef, dims)
copyto!(gpu_obj, cpu_obj)


# pinned and mapped

buf = Mem.alloc(Mem.Host, bytes, Mem.HOSTALLOC_DEVICEMAP)

cpu_ptr = convert(Ptr{T}, buf)
cpu_obj = unsafe_wrap(Array, cpu_ptr, dims)

gpu_ptr = convert(CuPtr{T}, buf)
gpu_obj = unsafe_wrap(CuArray, gpu_ptr, dims)

Thoughts?

wsphillips · 2019-03-27T13:57:25Z

Thoughts?

I think it looks good/hits the mark!

As mentioned I won't have access to a GPU until tomorrow, but I will try to look at the branch and see how it holds up to some intentional abuse in the morning 😃

wsphillips · 2019-03-28T10:07:35Z

This is reallllllyyyy nice. Just had a chance to do some basic testing.

Only one comment regarding calling Base.copyto! from CuArrays.

Right now it looks like the default Mem.copy! is making this happen. Could a dispatch to permit calling the extra flags (async + stream) be added there?

e.g. copyto!(gpu_obj, cpu_obj, async=true, stream=someotherstream) calls into the Mem.copy! dispatch containing async/stream flags.

Would it be sane to make the unsafe_copyto!(::HostBuffer) (aka. pinned memory) async by default, whereas copies to and from other memories would be synchronizing? Or do we still want a async flag?

In my case I would almost always use async. But there's two reasons to pin: speed and async/non-default stream. I can think of situations where the implicit sync might be useful. I like the idea of having a simple default dispatch and then optional flags (how you've done with Mem.copy!)

lcw · 2019-04-10T22:32:15Z

I just got a chance to test this out. It seems to work great!

I ran the following code

using CUDAdrv

"""
    empiricalbandwidth(nbytes=2*1024^3; devicenumber=0, numtests=10)

Compute the emperical bandwidth in GB/s of a CUDA device using `nbytes` of
memory.  The device to test can be slected with `devicenumber` and the
bandwidth is an average of `ntests`.
"""
function empiricalbandwidth(nbytes=1024^2; devicenumber=0, ntests=10, pinned=false)
    dev = CuDevice(devicenumber)
    ctx = CuContext(dev)
    stm = CuStream()

    d = rand(Char, nbytes)
    a = pinned ? Mem.alloc(Mem.Host, nbytes) : pointer(d)
    b = Mem.alloc(Mem.Device, nbytes)

    Mem.copy!(a, b, nbytes)
    Mem.copy!(b, a, nbytes)

    t0, t1 = CuEvent(), CuEvent()
    record(t0, stm)
    for n = 1:ntests
        Mem.copy!(a, b, nbytes, async=true, stream=stm)
    end
    record(t1, stm)
    synchronize(t1)
    t_dtoh = elapsed(t0, t1)
    bandwidth_dtoh = nbytes*ntests/(t_dtoh*1e9)

    t0, t1 = CuEvent(), CuEvent()
    record(t0, stm)
    for n = 1:ntests
        Mem.copy!(b, a, nbytes, async=true, stream=stm)
    end
    record(t1, stm)
    synchronize(t1)
    t_htod = elapsed(t0, t1)
    bandwidth_htod = nbytes*ntests/(t_htod*1e9)

    pinned && Mem.free(a)
    Mem.free(b)

    (bandwidth_dtoh, bandwidth_htod)
end

and got

❯ julia --project=.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.2.0-DEV.670 (2019-04-10)
 _/ |\__'_|_|_|\__'_|  |  Commit 20499752ac (0 days old master)
|__/                   |

julia> include("pinned.jl")
empiricalbandwidth

julia> empiricalbandwidth()
(5.266134908961078, 9.937828167245515)

julia> empiricalbandwidth(pinned=true)
(11.4790166633746, 11.016305722250417)

Let me know if there is anything I can do to help get this branch merged in.

We're not taking the address of a Julia object, as unsafe_convert does.

maleadt added the api coverage label Mar 25, 2019

maleadt added 19 commits April 12, 2019 10:10

Add support for pinned memory.

75227af

Fix some deprecations.

7464de7

Remove old crufty Mem API and use Base.copyto.

2a7f4ff

Move memory info out of the Mem class, ref. Base.

cce0ca7

Adapt vadd example.

290184e

Use CuArrays for vadd example.

6fb665b

Bug fixes.

0c9e4ad

Make view boundscheck elidable.

9bad936

Change copyto API.

eb0c14a

Use convert, not unsafe_convert.

1d86c24

We're not taking the address of a Julia object, as unsafe_convert does.

Fixes.

8db7389

Fixes.

e082368

Using CuArrays.CUDAnative implicitly initializes.

5c733c4

Metaprogram the different unsafe_copyto! methods.

0b245fb

Add missing unsafe_copyto! method.

eb3e663

Fixes and improvements.

39efdc6

Support the special CPU and INVALID devices.

97722c7

Fixes.

1b7b50f

Improve prefetch.

3fa352f

maleadt added 6 commits April 12, 2019 10:11

Add tests.

6c12df7

Remove dependency on CuArrays.

60300b1

Fix Mem.set!

c910120

Document Mem.copy!

49125bb

Add support for registering existing memory ranges.

54197f0

Remove wrong assertion.

cf0888e

maleadt force-pushed the tb/pinned branch from fb438d5 to cf0888e Compare April 12, 2019 08:11

maleadt marked this pull request as ready for review April 12, 2019 08:20

maleadt merged commit d899dd5 into master Apr 12, 2019

bors bot deleted the tb/pinned branch April 12, 2019 08:21

maleadt mentioned this pull request Apr 17, 2019

Implementation of pinned memory allocation #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for pinned memory. #133

Add support for pinned memory. #133

maleadt commented Mar 25, 2019

maleadt commented Mar 25, 2019

vchuravy commented Mar 25, 2019

maleadt commented Mar 26, 2019

maleadt commented Mar 26, 2019 •

edited

Loading

wsphillips commented Mar 26, 2019 •

edited

Loading

maleadt commented Mar 27, 2019

maleadt commented Mar 27, 2019 •

edited

Loading

wsphillips commented Mar 27, 2019 •

edited

Loading

wsphillips commented Mar 28, 2019

lcw commented Apr 10, 2019 •

edited

Loading

Add support for pinned memory. #133

Add support for pinned memory. #133

Conversation

maleadt commented Mar 25, 2019

maleadt commented Mar 25, 2019

vchuravy commented Mar 25, 2019

maleadt commented Mar 26, 2019

maleadt commented Mar 26, 2019 • edited Loading

wsphillips commented Mar 26, 2019 • edited Loading

maleadt commented Mar 27, 2019

maleadt commented Mar 27, 2019 • edited Loading

wsphillips commented Mar 27, 2019 • edited Loading

wsphillips commented Mar 28, 2019

lcw commented Apr 10, 2019 • edited Loading

maleadt commented Mar 26, 2019 •

edited

Loading

wsphillips commented Mar 26, 2019 •

edited

Loading

maleadt commented Mar 27, 2019 •

edited

Loading

wsphillips commented Mar 27, 2019 •

edited

Loading

lcw commented Apr 10, 2019 •

edited

Loading