Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Deadlock during memory free #685

Closed
maleadt opened this issue Apr 17, 2020 · 5 comments
Closed

Deadlock during memory free #685

maleadt opened this issue Apr 17, 2020 · 5 comments
Labels

Comments

@maleadt
Copy link
Member

maleadt commented Apr 17, 2020

We maintain a mapping of pointers to memory blocks, allocated, protected by a (non-reentrant) spinlock, allocated_lock, but when we take that lock setting a value in the dict can trigger a GC collection which might deadlock when freeing other GPU memory:

free at /global/u1/m/marius/work/s4/dev/CuArrays/src/memory/binned.jl:393
unknown function (ip: 0x2aac1ff19ad2)
_jl_invoke at /global/u1/m/marius/src/julia-1.4/src/gf.c:2144 [inlined]
jl_apply_generic at /global/u1/m/marius/src/julia-1.4/src/gf.c:2322
macro expansion at /global/homes/m/marius/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
macro expansion at /global/u1/m/marius/work/s4/dev/CuArrays/src/memory.jl:218 [inlined]
macro expansion at ./util.jl:234 [inlined]
free at /global/u1/m/marius/work/s4/dev/CuArrays/src/memory.jl:217 [inlined]
_unsafe_free! at /global/u1/m/marius/work/s4/dev/CuArrays/src/array.jl:51
unsafe_free! at /global/u1/m/marius/work/s4/dev/CuArrays/src/array.jl:40
_jl_invoke at /global/u1/m/marius/src/julia-1.4/src/gf.c:2144 [inlined]
jl_apply_generic at /global/u1/m/marius/src/julia-1.4/src/gf.c:2322
jl_apply at /global/u1/m/marius/src/julia-1.4/src/julia.h:1692 [inlined]
run_finalizer at /global/u1/m/marius/src/julia-1.4/src/gc.c:277
jl_gc_run_finalizers_in_list at /global/u1/m/marius/src/julia-1.4/src/gc.c:363
run_finalizers at /global/u1/m/marius/src/julia-1.4/src/gc.c:391 [inlined]
run_finalizers at /global/u1/m/marius/src/julia-1.4/src/gc.c:370
jl_gc_collect at /global/u1/m/marius/src/julia-1.4/src/gc.c:3124
maybe_collect at /global/u1/m/marius/src/julia-1.4/src/gc.c:827 [inlined]
jl_gc_pool_alloc at /global/u1/m/marius/src/julia-1.4/src/gc.c:1142
jl_gc_alloc_ at /global/u1/m/marius/src/julia-1.4/src/julia_internal.h:246 [inlined]
_new_array_ at /global/u1/m/marius/src/julia-1.4/src/array.c:106 [inlined]
_new_array at /global/u1/m/marius/src/julia-1.4/src/array.c:162 [inlined]
jl_alloc_array_1d at /global/u1/m/marius/src/julia-1.4/src/array.c:433
Array at ./boot.jl:405 [inlined]
rehash! at ./dict.jl:193
_setindex! at ./dict.jl:367 [inlined]
setindex! at ./dict.jl:388
macro expansion at /global/u1/m/marius/work/s4/dev/CuArrays/src/memory/binned.jl:384 [inlined]
macro expansion at ./lock.jl:183 [inlined]
alloc at /global/u1/m/marius/work/s4/dev/CuArrays/src/memory/binned.jl:383
@colinxs
Copy link

colinxs commented Apr 23, 2020

In reference to #691, changing allocated_lock in src/memory/binned.jl to a ReentrantLock appears to alleviate this issue, and it's not too much slower:

julia> l1=Threads.SpinLock();

julia> l2=ReentrantLock();

julia> @btime Base.@lock $l1 begin end
  20.569 ns (0 allocations: 0 bytes)

julia> @btime Base.@lock $l2 begin end
  37.242 ns (0 allocations: 0 bytes)

Perhaps changing all/some of the locks in CuArrays to a ReentrantLock could serve as a stopgap?

@maleadt
Copy link
Member Author

maleadt commented Apr 24, 2020

Perhaps changing all/some of the locks in CuArrays to a ReentrantLock could serve as a stopgap?

No, the code does not support that. I'll have a look.

@colinxs
Copy link

colinxs commented Apr 24, 2020

Ah I thought they'd be interchangeable. Could you elaborate why the code wouldn't support a ReentrantLock?

@maleadt
Copy link
Member Author

maleadt commented Apr 24, 2020

A reentrant lock can be taken by the same task. But in this case, we're protecting against concurrent access of datastructures that are used in regular code as well as from finalizers. So while we're modifying the map and have the lock, a GC run could trigger our finalizer that, potentially on the same task, will be able to acquire the lock and start modifying the same datastructure too.

@colinxs
Copy link

colinxs commented Apr 24, 2020

I see. So there's no guarantee that finalizes are run in a separate task and swapping allocated_lock to a ReentrantLock only masked that underlying problem. Thanks for clarifying! I tried to grok the code in gc.c/task.c etc. but.. there's a lot there :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants