-
Notifications
You must be signed in to change notification settings - Fork 0
Migrate to an operator based implementation #11
Conversation
Currently there is an issue with LazyArrays with CuArrays (introduced in SciML/LinearSolve.jl#484): julia> ∂A_lazy
(-).((16-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) .* (1×16 transpose(::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) with eltype Float32)):
-123.729 327.617 -62.0831 -90.923 -39.8437 182.524 123.199 -399.96 223.462 400.244 -239.278 -270.178 -280.733 598.985 63.9631 -172.205
87.6835 -232.174 43.9968 64.4349 28.2363 -129.35 -87.3081 283.441 -158.362 -283.643 169.57 191.469 198.949 -424.486 -45.329 122.037
114.568 -303.362 57.4868 84.1915 36.8939 -169.011 -114.078 370.348 -206.918 -370.612 221.562 250.175 259.949 -554.639 -59.2275 159.455
-84.9289 224.88 -42.6146 -62.4107 -27.3492 125.286 84.5653 -274.537 153.387 274.732 -164.243 -185.454 -192.699 411.151 43.905 -118.203
-30.3105 80.258 -15.2088 -22.2739 -9.76072 44.7138 30.1807 -97.9801 54.7426 98.0499 -58.617 -66.1869 -68.7726 146.736 15.6694 -42.1858
-92.1372 243.967 -46.2315 -67.7077 -29.6705 135.92 91.7427 -297.838 166.406 298.05 -178.183 -201.194 -209.054 446.047 47.6314 -128.236
86.1454 -228.102 43.225 63.3046 27.741 -127.081 -85.7766 278.47 -155.584 -278.668 166.596 188.11 195.459 -417.04 -44.5339 119.896
107.607 -284.928 53.9935 79.0755 34.652 -158.74 -107.146 347.844 -194.344 -348.091 208.099 234.973 244.153 -520.936 -55.6285 149.766
-1140.79 3020.66 -572.412 -838.319 -367.363 1682.89 1135.91 -3687.67 2060.34 3690.29 -2206.16 -2491.07 -2588.38 5522.7 589.745 -1587.74
-1369.77 3626.96 -687.306 -1006.58 -441.099 2020.67 1363.9 -4427.85 2473.89 4431.0 -2648.98 -2991.07 -3107.92 6631.21 708.118 -1906.43
-719.098 1904.07 -360.82 -528.434 -231.567 1060.81 716.019 -2324.52 1298.74 2326.17 -1390.65 -1570.25 -1631.59 3481.24 371.746 -1000.83
337.419 -893.44 169.306 247.955 108.657 -497.758 -335.974 1090.72 -609.399 -1091.5 652.53 736.799 765.582 -1633.48 -174.433 469.616
287.084 -760.159 144.049 210.966 92.4481 -423.504 -285.855 928.013 -518.491 -928.674 555.188 626.885 651.375 -1389.81 -148.411 399.561
450.972 -1194.11 226.283 331.4 145.224 -665.27 -449.041 1457.79 -814.483 -1458.83 872.128 984.756 1023.23 -2183.21 -233.135 627.658
2247.77 -5951.79 1127.86 1651.79 723.837 -3315.89 -2238.15 7266.03 -4059.61 -7271.2 4346.93 4908.3 5100.05 -10881.7 -1162.01 3128.43
-408.027 1080.4 -204.735 -299.842 -131.395 601.918 406.28 -1318.97 736.922 1319.91 -789.078 -890.981 -925.788 1975.31 210.934 -567.888 which is returned by the adjoint code for linear solve, but I can't seem to figure out a way to materialize this array. julia> CuMatrix(∂A_lazy)
ERROR: This object is not a GPU array
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] backend(::Type)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:225
[3] backend(x::Matrix{Float32})
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:226
[4] _copyto!
@ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
[5] materialize!
@ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:32 [inlined]
[6] materialize!
@ ./broadcast.jl:911 [inlined]
[7] _copyto!(::ArrayLayouts.DenseColumnMajor, ::LazyArrays.BroadcastLayout{typeof(-)}, dest::Matrix{Float32}, bc::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{…}, Transpose{…}}}}})
@ LazyArrays ~/.julia/packages/LazyArrays/8MqhP/src/lazybroadcasting.jl:24
[8] _copyto!
@ ~/.julia/packages/ArrayLayouts/sP5Ce/src/ArrayLayouts.jl:259 [inlined]
[9] copyto!
@ ~/.julia/packages/ArrayLayouts/sP5Ce/src/ArrayLayouts.jl:261 [inlined]
[10] copyto_axcheck!
@ ./abstractarray.jl:1177 [inlined]
[11] Matrix{Float32}(x::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})
@ Base ./array.jl:673
[12] Array
@ ./boot.jl:501 [inlined]
[13] convert
@ ./array.jl:665 [inlined]
[14] CuArray
@ ~/.julia/packages/CUDA/htRwP/src/array.jl:419 [inlined]
[15] CuArray
@ ~/.julia/packages/CUDA/htRwP/src/array.jl:423 [inlined]
[16] (CuArray{T, 2} where T)(x::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})
@ CUDA ~/.julia/packages/CUDA/htRwP/src/array.jl:431
[17] top-level scope
@ REPL[246]:1
[18] top-level scope
@ ~/.julia/packages/Infiltrator/TNlCu/src/Infiltrator.jl:798
[19] top-level scope
@ ~/.julia/packages/CUDA/htRwP/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types. julia> ∂A .= ∂A_lazy
ERROR: GPU compilation of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceMatrix{Float32, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{…}, Base.OneTo{…}}, typeof(identity), Tuple{Base.Broadcast.Extruded{…}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
.args is of type Tuple{Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
.1 is of type Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
.x is of type LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}} which is not isbits.
.args is of type Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}} which is not isbits.
.1 is of type LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}} which is not isbits.
.args is of type Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}} which is not isbits.
.1 is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
.data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
.rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.
.obj is of type CUDA.Mem.DeviceBuffer which is not isbits.
.finalizer is of type Any which is not isbits.
.count is of type Base.Threads.Atomic{Int64} which is not isbits.
.2 is of type Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}} which is not isbits.
.parent is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
.data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
.rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits. I don't see any GPU CI on LazyArrays, so it is unclear to me that it is supported properly in the first place. @mohamed82008 or @ChrisRackauckas do you know how to proceed here? |
4c2420c
to
de88368
Compare
Try |
Alternatively, pre-allocate the result matrix and use a = CuMatrix(..)
copy!(a, b)
# or
a .= b assuming |
Tried that see the broadcast compile error |
If |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11 +/- ##
==========================================
+ Coverage 64.22% 71.44% +7.22%
==========================================
Files 13 18 +5
Lines 763 872 +109
==========================================
+ Hits 490 623 +133
+ Misses 273 249 -24 ☔ View full report in Codecov by Sentry. |
Fixes #10
TODOs
init_cacheval
functions from LinearSolve