Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error creating group #4

Closed
tbenst opened this issue Aug 17, 2021 · 5 comments
Closed

Error creating group #4

tbenst opened this issue Aug 17, 2021 · 5 comments

Comments

@tbenst
Copy link
Contributor

tbenst commented Aug 17, 2021

Hello, thanks for this package!

I've encountered a very subtle bug that does seem specifically triggered by H5Sparse.jl and not HDF5.jl. I've created a minimally working example below. I have not been able to trigger the bug without the use of a Channel. It seems there is a race condition related to group creation...

using HDF5, H5Sparse, SparseArrays
import Base.Threads: @threads
function write_n_sparse_datasets(h5path, n, channel, blocker, type=:H5)
    h5 = h5open(h5path, "w")
    while n > 0
        @assert Threads.threadid() == 1 # we only access from the master thread
        dset_name, data = take!(channel)
        take!(blocker)
        println("take $n")
        if type == :H5
            h5[dset_name] = data
        elseif type == :H5Sparse
            H5SparseMatrixCSC(h5, dset_name, data)
        end
        n -= 1
    end
    h5
end

function test(type=:H5, n=10, N=512*512*300)
    to_write = Channel(Inf)
    blocker = Channel(5) # restrict to N active threads
    h5_path = tempname()*".h5"
    isfile(h5_path) ? rm(h5_path) : ()
    @threads for i in 1:n
        @async begin
            put!(blocker, i)
            println("start $i")
            data = collect(rand(N,1) .> 0.9)
            if type == :H5
                put!(to_write, "$i" => data)
            elseif type == :H5Sparse
                data = sparse(data)
                put!(to_write, "$i" => data)
            end
            println("put $i")
        end
    end
    println("call write")
    h5 = write_n_sparse_datasets(h5_path, n, to_write, blocker, type)

    @show keys(h5)
    close(h5)
    rm(h5_path)
end

test(:H5, 10) # always works
test(:H5, 50) # always works
test(:H5Sparse, 5) # usually works
test(:H5Sparse, 10) # sometimes works
test(:H5Sparse, 50) # always errors
Error output
start 2
start 3
start 5
start 7
start 4
call write
put 7
take 10
start 6
put 5
put 4
put 2
put 3
take 9
start 9HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
  #000: H5G.c line 388 in H5Gcreate2(): unable to create group
    major: Symbol table
    minor: Unable to initialize object
  #001: H5VLcallback.c line 4081 in H5VL_group_create(): group create failed
    major: Virtual Object Layer
    minor: Unable to create file
  #002: H5VLcallback.c line 4047 in H5VL__group_create(): group create failed
    major: Virtual Object Layer
    minor: Unable to create file
  #003: H5VLnative_group.c line 74 in H5VL__native_group_create(): unable to create group
    major: Symbol table
    minor: Unable to initialize object
  #004: H5Gint.c line 158 in H5G__create_named(): unable to create and link to group
    major: Symbol table
    minor: Unable to initialize object
  #005: H5L.c line 1804 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
  #006: H5L.c line 2045 in H5L__create_real(): can't insert link
    majHDF5-DIor: Links
    minor: Unable to insert object
AG: Error detected in   #012: H5Gtraverse.c line 855 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #013: H5Gtraverse.c line 585 in H5G__traverse_real(): can't look up component
    major: Symbol table
    minor:HDF5 ( Object not found
  #014: H5Gobj.c line 1125 in H5G__obj_lookup(): can't check for link info message
    major: Symbol table
    minor: Can't get value
  #015: H5Gobj.c line 326 in H5G__obj_get_linfo(): unable to read object header
    major: Symbol table
    minor: Can't get value
  #016: H5Omessage.c line 883 in H5O_msg_exists(): unable to protect object header
    ma1.12.0) threajor: dObject header
 0:
    minor:   #Unable to protect metadata
000:   #017: H5Oint.c line H5O.c line 1082 i1239 in H5Oclose()n : unable to close object
H5O_protect()    major: Object header
: unable to load object header
    major:     minor: Unable to release object
Object header
    minor: Unable to protect metadata
  #  #018: H5AC.c line 1312 in 001: H5AC_protect(): H5C_protect() failed
    major: Object cache
H5I.c line     minor: Unable to protect metadata
  #019: H5C.c line 2242 in H5C_protect(): 1422 in H5I_dec_app_ref(): can't decrement ID ref count
    major: Object atom
ring type mismatch occurred for cache entry
    major: Object cache
    minor: Internal error detected
    minor: Unable to decrement reference count
  #002: (null) line 353 in (null)()
    major: Dataset
    minor: Close failed
  #4294967279: (null) line 2639 in (null)()
    major: Virtual Object Layer
    minor: Can't reset object
  #4294967280: (null) line 2320 in (null)()
    major: Virtual Object Layer
    minor: Bad value
  #4294967281: (null) line 388 in (null)()
    major: Symbol table
    minor: Unable to initialize object
  #4294967282: (null) line 4081 in (null)()
    major: Virtual Object Layer
    minor: Unable to create file
  #4294967283: (null) line 4047 in (null)()
    major: Virtual Object Layer
    minor: Unable to create file
  #4294967284: (null) line 74 in (null)()
    major: Symbol table
    minor: Unable to initialize object
  #4294967285: (null) line 158 in (null)()
    major: Symbol table
    minor: Unable to initialize object
  #4294967286: (null) line 1804 in (null)()
    major: Links
    minor: Unable to initialize object
  #4294967287: (null) line 2045 in (null)()
    major: Links
    minor: Unable to insert object
  #4294967288: (null) line 855 in (null)()
    major: Symbol table
    minor: Object not found
  #4294967289: (null) line 585 in (null)()
    major: Symbol table
    minor: Object not found
  #4294967290: (null) line 1125 in (null)()
    major: Symbol table
    minor: Can't get value
  #4294967291: (null) line 326 in (null)()
    major: Symbol table
    minor: Can't get value
  #4294967292: (null) line 883 in (null)()
    major: Object header
    minor: Unable to protect metadata
  #4294967293: (null) line 1082 in (null)()
    major: Object header
    minor: Unable to protect metadata
  #4294967294: (null) line 1312 in (null)()
    major: Object cache
    minor: Unable to protect metadata
  #4294967295: (null) line 2242 in (null)()
    major: Object cache
    minor: Internal error detected
error in running finalizer: ErrorException("Error closing object")
ERROR: 
error at ./error.jl:33
h5o_close at /home/tyler/.julia/packages/HDF5/0iEnL/src/api.jl:893 [inlined]
close at /home/tyler/.julia/packages/HDF5/0iEnL/src/HDF5.jl:586
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
LoadError: run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3260
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:880 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1204
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:285 [inlined]
_new_array_ at /buildworker/worker/package_linux64/build/src/array.c:132 [inlined]
_new_array at /buildworker/worker/package_linux64/build/src/array.c:188 [inlined]
jl_alloc_array_2d at /buildworker/worker/package_linux64/build/src/array.c:466
Array at ./boot.jl:450 [inlined]
Array at ./boot.jl:458 [inlined]
Array at ./boot.jl:465 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:288 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:289
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:277 [inlined]
macro expansion at /home/tyler/code/lensman/notebooks/debug.jl:30 [inlined]
#5 at ./task.jl:411
unknown function (ip: 0x7fa87413fd9c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Error creating group //5
Stacktrace:
  [1] error(::String, ::String, ::String, ::String)
    @ Base ./error.jl:42
  [2] h5g_create
    @ ~/.julia/packages/HDF5/0iEnL/src/api.jl:647 [inlined]
  [3] create_group(parent::HDF5.File, path::String, lcpl::HDF5.Properties, gcpl::HDF5.Properties)
    @ HDF5 ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:724
  [4] create_group
    @ ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:723 [inlined]
  [5] h5writecsc(fid::HDF5.File, name::String, m::Int64, n::Int64, colptr::Vector{Int64}, rowval::Vector{Int64}, nzval::Vector{Bool}; overwrite::Bool, chunk::String, blosc::Int64, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ H5Sparse ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:238
  [6] h5writecsc
    @ ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:231 [inlined]
  [7] #h5writecsc#3
    @ ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:227 [inlined]
  [8] h5writecsc
    @ ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:226 [inlined]
  [9] H5SparseMatrixCSC(fid::HDF5.File, name::String, B::SparseMatrixCSC{Bool, Int64}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ H5Sparse ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:85
 [10] H5SparseMatrixCSC(fid::HDF5.File, name::String, B::SparseMatrixCSC{Bool, Int64})
    @ H5Sparse ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:85
 [11] write_n_sparse_datasets(h5path::String, n::Int64, dset_size::Vector{Int64}, channel::Channel{Any}, blocker::Channel{Any}, type::Symbol)
    @ Main ~/code/lensman/notebooks/debug.jl:13
 [12] test(type::Symbol, n::Int64)
    @ Main ~/code/lensman/notebooks/debug.jl:42
 [13] top-level scope
    @ ~/code/lensman/notebooks/debug.jl:52
in expression starting at /home/tyler/code/lensman/notebooks/debug.jl:52
put 6
@severinson
Copy link
Owner

Glad you like it :)
The underlying HDF5 library isn't thread-safe by default; see, e.g., https://support.hdfgroup.org/HDF5/faq/threadsafe.html
As a result, neither is H5Sparse.jl. However, HDF5 can be compiled to be thread-safe, although I think it just serializes all operations using locks. Since performance is limited by disk speed, I don't think writing several datasets concurrently will give any performance benefits.

@tbenst
Copy link
Contributor Author

tbenst commented Aug 17, 2021

thanks for the thought--yes, this code is accessing HDF5 from only the master thread that is running write_n_sparse_datasets. All other threads send messages through a Channel

In my case, rather than data = collect(rand(N,1) .> 0.9) I have a computationally expensive expression that takes up a ton of RAM. I want to parallelize the computation for speed advantage, but need to write to HDF5 once computation to free up memory. Hence, the use of multiples producers, and a single-threaded consumer that writes to HDF5.

Edit: to make this more clear, I added an assertion to the code: @assert Threads.threadid() == 1

@severinson
Copy link
Owner

Ah, sorry. My mistake.
I wonder if it could have something to do with caching. Could you please try adding flush(g) just before the return statement of h5writecsc to see if that resolves the issue?

@tbenst
Copy link
Contributor Author

tbenst commented Aug 17, 2021

thx for the thought. no resolution I'm afraid. What's frustrating is the error is different each time it runs. Clearly related to threading, though--works fine with julia -t1.

unable to flush file
signal (11): Segmentation fault
in expression starting at /home/tyler/code/lensman/notebooks/debug.jl:89
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
  #000: H5F.c line 849 in H5Fflush(): unable to flush file
    major: File accessibility
    minor: Unable to flush data from cache
  #001: H5VLcallback.c line 3764 in H5VL_file_specific(): can't reset VOL wrapper info
    major: Virtual Object Layer
    minor: Can't reset object
  #002: H5FL_fac_free at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5VLint.c line 2320 in H5VL_reset_vol_wrapper(): no VOL object wrap context?
    major: Virtual Object Layer
    minor: Bad value
  #003: H5VLcallback.c line 3755 in H5VL_file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #004: H5VLcallback.c line 3684 in H5VL__file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #005: H5VLnative_file.c line 330 in H5VL__native_file_specific(): unable to flush mounted file hierarchy
    major: File accessibility
    minor: Unable to flush data from cache
  #006: H5Fmount.c line 699 in H5F_flush_mounts(): unable to flush mounted file hierarchy
    major: File accessibility
    minor: Unable to flush data from cache
  #007: H5Fmount.c line 660 in H5F_flush_mounts_recurse(): unable to flush file's cached information
    major: File accessibility
    minor: Unable to flush data from cache
H5SL_release_common at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5SL_close at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5P_close at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5I_dec_ref at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5I_dec_app_ref at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5Pclose at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
h5p_close at /home/tyler/.julia/packages/HDF5/0iEnL/src/api.jl:958 [inlined]
close at /home/tyler/.julia/packages/HDF5/0iEnL/src/HDF5.jl:626
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3260
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:880 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1204
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:285 [inlined]
_new_array_ at /buildworker/worker/package_linux64/build/src/array.c:132 [inlined]
_new_array at /buildworker/worker/package_linux64/build/src/array.c:188 [inlined]
jl_alloc_array_2d at /buildworker/worker/package_linux64/build/src/array.c:466
Array at ./boot.jl:450 [inlined]
Array at ./boot.jl:458 [inlined]
Array at ./boot.jl:465 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:288 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:289
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:277 [inlined]
macro expansion at /home/tyler/code/lensman/notebooks/debug.jl:53 [inlined]
#1 at ./task.jl:411
unknown function (ip: 0x7fe96c06aaec)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 9944520 (Pool: 9939075; Big: 5445); GC: 11
[1]    3962610 segmentation fault (core dumped)  julia -t36 notebooks/debug.jl
Segmentation fault
signal (11): Segmentation fault
in expression starting at /home/tyler/code/lensman/notebooks/debug.jl:89

signal (11): Segmentation fault
in expression starting at /home/tyler/code/lensman/notebooks/debug.jl:89
H5C_protect at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5AC_protect at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5B_insert at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5C_unprotect at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5AC_unprotect at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5O_unprotect at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5O_msg_read at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5G__obj_get_linfo at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5G__obj_info at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5D__btree_idx_insert at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5VL__native_group_get at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5VL_group_get at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5Gget_num_objs at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
h5g_get_num_objs at /home/tyler/.julia/packages/HDF5/0iEnL/src/api.jl:679 [inlined]
5D__chunk_flush_entry.constprop.22 at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
h5g_get_num_objs at /home/tyler/.julia/packages/HDF5/0iEnL/src/api_helpers.jl:139 [inlined]
length at /home/tyler/.julia/packages/HDF5/0iEnL/src/HDF5.jl:932 [inlined]
keys at /home/tyler/.julia/packages/HDF5/0iEnL/src/HDF5.jl:944
#h5writecsc#4 at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:231
h5writecsc at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:231 [inlined]
#h5writecsc#3 at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:228 [inlined]
H5D__chunk_flush at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
h5writecsc at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:227 [inlined]
#H5SparseMatrixCSC#2 at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:86
H5SparseMatrixCSC at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:ce954eca/lib/libhdf5.so (unknown line)
H5SparseMatrixCSC at /home/tyler/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:86
unknown function (ip: 0x7f24f0096ea9)
H5D_close at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5VL__native_dataset_close at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
H5VL_dataset_close at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
H5D__close_cb at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
write_n_sparse_datasets at /home/tyler/code/lensman/notebooks/debug.jl:29
test at /home/tyler/code/lensman/notebooks/debug.jl:75
H5I_dec_ref at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
test at /home/tyler/code/lensman/notebooks/debug.jl:45
unknown function (ip: 0x7f24f00867c6)
H5I_dec_app_ref at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
H5Oclose at /home/tyler/.julia/artifacts/997813d46a8a06e6e9871a2a01483f91ce954eca/lib/libhdf5.so (unknown line)
h5o_close at /home/tyler/.julia/packages/HDF5/0iEnL/src/api.jl:892 [inlined]
close at /home/tyler/.julia/packages/HDF5/0iEnL/src/HDF5.jl:586
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:562
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:670
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:877
run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3260
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:880 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1204
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:285 [inlined]
_new_array_ at /buildworker/worker/package_linux64/build/src/array.c:132 [inlined]
_new_array at /buildworker/worker/package_linux64/build/src/array.c:188 [inlined]
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:825
Array at ./boot.jl:450 [inlined]
Ar_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:929
Array at ./boot.jl:458 [inlined]
Array at ./boot.jl:465 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:288 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:289
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:277 [inlined]
macro expansion at /home/tyler/code/lensman/notebooks/debug.jl:53 [inlined]
#1 at ./task.jl:411
unknown function (ip: 0x7f24f008daec)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 13027255 (Pool: 13019209; Big: 8046); GC: 12
unable to get link info
put 95HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
  #000: H5L.c line 961 in H5Lexists(): unable to get link info
    major: Links
    minorHDF5-DI: Can't get value
  #006: H5VLcallback.c line 5207 in AG: Error detected in H5VL_link_specific()HDF5 (1.12.0) thread 0:
: can't reset VOL wrapper info
    major: Virtual Object Layer
    minor: Can't reset object
  #  #000: H5O.c line 1239 in H5Oclose(): unable to close object
    major: Object header
    minor: Unable to release object
007: H5VLint.c line 2320 in H5VL_reset_vol_wrapper(): no VOL object wrap context?
    major: Virtual Object Layer
    minor: Bad value
  #001: H5I.c line 1422 in H5I_dec_app_ref(): can't decrement ID ref count
    major: Object atom
    minor: Unable to decrement reference count
  #002: H5Dint.c line 353 in H5D__close_cb(): unable to close dataset
    major: Dataset
    minor: Close failed
  #003: H5VLcallback.c line 2639 in H5VL_dataset_close(): can't reset VOL wrapper info
    major: Virtual Object Layer
    minor: Can't reset object
  #004: H5VLint.c line 2320 in H5VL_reset_vol_wrapper(): no VOL object wrap context?
    major: Virtual Object Layer
    minor: Bad value
  #005: H5L.c line 961 in H5Lexists(): unable to get link info
    major: Links
    minor: Can't get value
  #006: H5VLcallback.c line 5207 in H5VL_link_specific(): can't reset VOL wrapper info
    major: Virtual Object Layer
    minor: Can't reset object
  #007: H5VLint.c line 2320 in H5VL_reset_vol_wrapper(): no VOL object wrap context?
    major: Virtual Object Layer
    minor: Bad value

The distributed version runs fine, however. Very odd.

using HDF5, H5Sparse, SparseArrays
import Base.Threads: @threads
using Distributed
import Distributed: @distributed


addprocs(36)
@everywhere begin
    import Pkg
    Pkg.activate(".")
    using SparseArrays
end


function write_n_sparse_datasets(h5path, n, channel, blocker, type=:H5)
    h5 = h5open(h5path, "w")
    while n > 0
        @assert Threads.threadid() == 1 # we only access from the master thread
        dset_name, data = take!(channel)
        take!(blocker)
        println("take $n")
        if type == :H5
            h5[dset_name] = data
        elseif type == :H5Sparse
            H5SparseMatrixCSC(h5, dset_name, data)
        end
        n -= 1
    end
    h5
end

function test(type=:H5, n=10, N=512*512*300)
    to_write = RemoteChannel(()->Channel(Inf))
    blocker = RemoteChannel(()->Channel(5))
    h5_path = tempname()*".h5"
    isfile(h5_path) ? rm(h5_path) : ()
    @distributed for i in 1:n
        @async begin
            put!(blocker, i)
            println("start $i")
            data = collect(rand(N,1) .> 0.9)
            if type == :H5
                put!(to_write, "$i" => data)
            elseif type == :H5Sparse
                data = sparse(data)
                put!(to_write, "$i" => data)
            end
            println("put $i")
        end
    end
    println("call write")
    h5 = write_n_sparse_datasets(h5_path, n, to_write, blocker, type)

    @show keys(h5)
    close(h5)
    rm(h5_path)
end

test(:H5Sparse, 100) # no errors

@tbenst
Copy link
Contributor Author

tbenst commented Aug 17, 2021

I don't think this is a problem with H5Sparse at this point, so closing

@tbenst tbenst closed this as completed Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants