Migrate to an operator based implementation #11

avik-pal · 2024-03-29T00:03:20Z

Fixes #10

TODOs

Overload the init_cacheval functions from LinearSolve
Fix the tests
Test with ManifoldNeuralODEs that there is no regression

avik-pal · 2024-03-30T00:19:40Z

Currently there is an issue with LazyArrays with CuArrays (introduced in SciML/LinearSolve.jl#484):

julia> ∂A_lazy
(-).((16-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) .* (1×16 transpose(::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) with eltype Float32)):
  -123.729     327.617   -62.0831    -90.923    -39.8437     182.524     123.199    -399.96      223.462     400.244    -239.278   -270.178    -280.733      598.985     63.9631   -172.205
    87.6835   -232.174    43.9968     64.4349    28.2363    -129.35      -87.3081    283.441    -158.362    -283.643     169.57     191.469     198.949     -424.486    -45.329     122.037
   114.568    -303.362    57.4868     84.1915    36.8939    -169.011    -114.078     370.348    -206.918    -370.612     221.562    250.175     259.949     -554.639    -59.2275    159.455
   -84.9289    224.88    -42.6146    -62.4107   -27.3492     125.286      84.5653   -274.537     153.387     274.732    -164.243   -185.454    -192.699      411.151     43.905    -118.203
   -30.3105     80.258   -15.2088    -22.2739    -9.76072     44.7138     30.1807    -97.9801     54.7426     98.0499    -58.617    -66.1869    -68.7726     146.736     15.6694    -42.1858
   -92.1372    243.967   -46.2315    -67.7077   -29.6705     135.92       91.7427   -297.838     166.406     298.05     -178.183   -201.194    -209.054      446.047     47.6314   -128.236
    86.1454   -228.102    43.225      63.3046    27.741     -127.081     -85.7766    278.47     -155.584    -278.668     166.596    188.11      195.459     -417.04     -44.5339    119.896
   107.607    -284.928    53.9935     79.0755    34.652     -158.74     -107.146     347.844    -194.344    -348.091     208.099    234.973     244.153     -520.936    -55.6285    149.766
 -1140.79     3020.66   -572.412    -838.319   -367.363     1682.89     1135.91    -3687.67     2060.34     3690.29    -2206.16   -2491.07    -2588.38      5522.7      589.745   -1587.74
 -1369.77     3626.96   -687.306   -1006.58    -441.099     2020.67     1363.9     -4427.85     2473.89     4431.0     -2648.98   -2991.07    -3107.92      6631.21     708.118   -1906.43
  -719.098    1904.07   -360.82     -528.434   -231.567     1060.81      716.019   -2324.52     1298.74     2326.17    -1390.65   -1570.25    -1631.59      3481.24     371.746   -1000.83
   337.419    -893.44    169.306     247.955    108.657     -497.758    -335.974    1090.72     -609.399   -1091.5       652.53     736.799     765.582    -1633.48    -174.433     469.616
   287.084    -760.159   144.049     210.966     92.4481    -423.504    -285.855     928.013    -518.491    -928.674     555.188    626.885     651.375    -1389.81    -148.411     399.561
   450.972   -1194.11    226.283     331.4      145.224     -665.27     -449.041    1457.79     -814.483   -1458.83      872.128    984.756    1023.23     -2183.21    -233.135     627.658
  2247.77    -5951.79   1127.86     1651.79     723.837    -3315.89    -2238.15     7266.03    -4059.61    -7271.2      4346.93    4908.3      5100.05    -10881.7    -1162.01     3128.43
  -408.027    1080.4    -204.735    -299.842   -131.395      601.918     406.28    -1318.97      736.922    1319.91     -789.078   -890.981    -925.788     1975.31     210.934    -567.888

which is returned by the adjoint code for linear solve, but I can't seem to figure out a way to materialize this array.

julia> CuMatrix(∂A_lazy)
ERROR: This object is not a GPU array
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] backend(::Type)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:225
  [3] backend(x::Matrix{Float32})
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:226
  [4] _copyto!
    @ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
  [5] materialize!
    @ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:32 [inlined]
  [6] materialize!
    @ ./broadcast.jl:911 [inlined]
  [7] _copyto!(::ArrayLayouts.DenseColumnMajor, ::LazyArrays.BroadcastLayout{typeof(-)}, dest::Matrix{Float32}, bc::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{…}, Transpose{…}}}}})
    @ LazyArrays ~/.julia/packages/LazyArrays/8MqhP/src/lazybroadcasting.jl:24
  [8] _copyto!
    @ ~/.julia/packages/ArrayLayouts/sP5Ce/src/ArrayLayouts.jl:259 [inlined]
  [9] copyto!
    @ ~/.julia/packages/ArrayLayouts/sP5Ce/src/ArrayLayouts.jl:261 [inlined]
 [10] copyto_axcheck!
    @ ./abstractarray.jl:1177 [inlined]
 [11] Matrix{Float32}(x::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})
    @ Base ./array.jl:673
 [12] Array
    @ ./boot.jl:501 [inlined]
 [13] convert
    @ ./array.jl:665 [inlined]
 [14] CuArray
    @ ~/.julia/packages/CUDA/htRwP/src/array.jl:419 [inlined]
 [15] CuArray
    @ ~/.julia/packages/CUDA/htRwP/src/array.jl:423 [inlined]
 [16] (CuArray{T, 2} where T)(x::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})
    @ CUDA ~/.julia/packages/CUDA/htRwP/src/array.jl:431
 [17] top-level scope
    @ REPL[246]:1
 [18] top-level scope
    @ ~/.julia/packages/Infiltrator/TNlCu/src/Infiltrator.jl:798
 [19] top-level scope
    @ ~/.julia/packages/CUDA/htRwP/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.

julia> ∂A .= ∂A_lazy
ERROR: GPU compilation of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceMatrix{Float32, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{…}, Base.OneTo{…}}, typeof(identity), Tuple{Base.Broadcast.Extruded{…}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
      .x is of type LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}} which is not isbits.
        .args is of type Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}} which is not isbits.
          .1 is of type LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}} which is not isbits.
            .args is of type Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}} which is not isbits.
              .1 is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
                .data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
                  .rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.
                    .obj is of type CUDA.Mem.DeviceBuffer which is not isbits.
                    .finalizer is of type Any which is not isbits.
                    .count is of type Base.Threads.Atomic{Int64} which is not isbits.
              .2 is of type Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}} which is not isbits.
                .parent is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
                  .data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
                    .rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.

I don't see any GPU CI on LazyArrays, so it is unclear to me that it is supported properly in the first place.

@mohamed82008 or @ChrisRackauckas do you know how to proceed here?

mohamed82008 · 2024-03-30T17:11:19Z

Try collect. Didn't test it.

mohamed82008 · 2024-03-30T17:12:44Z

Alternatively, pre-allocate the result matrix and use copy! or broadcasting?

a = CuMatrix(..)
copy!(a, b)
# or
a .= b

assuming b is the lazy array

avik-pal · 2024-03-30T17:20:56Z

Tried that see the broadcast compile error

mohamed82008 · 2024-03-30T18:07:42Z

If a is a BroadcastArray, try a.f.(a.args...)

codecov · 2024-04-01T18:26:52Z

Codecov Report

Attention: Patch coverage is 71.03275% with 115 lines in your changes are missing coverage. Please review.

Project coverage is 71.44%. Comparing base (d0ce078) to head (256f0cc).

Files	Patch %	Lines
src/operator.jl	64.42%	53 Missing ⚠️
ext/BatchedRoutinesLinearSolveExt.jl	78.88%	19 Missing ⚠️
src/factorization.jl	69.09%	17 Missing ⚠️
src/chainrules.jl	68.18%	14 Missing ⚠️
ext/BatchedRoutinesCUDAExt/factorization.jl	66.66%	7 Missing ⚠️
...xt/BatchedRoutinesComponentArraysForwardDiffExt.jl	0.00%	3 Missing ⚠️
src/BatchedRoutines.jl	66.66%	1 Missing ⚠️
src/api.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #11      +/-   ##
==========================================
+ Coverage   64.22%   71.44%   +7.22%     
==========================================
  Files          13       18       +5     
  Lines         763      872     +109     
==========================================
+ Hits          490      623     +133     
+ Misses        273      249      -24

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

avik-pal added 3 commits March 28, 2024 20:02

Migrate to an operator based implementation

7dd7160

Fix some of the tests

fa4a2f5

Add rules for common sum

68b226a

avik-pal force-pushed the ap/operator branch 2 times, most recently from 4c2420c to de88368 Compare March 30, 2024 01:14

partial fix to linear solvers

b55fc0e

avik-pal force-pushed the ap/operator branch from de88368 to b55fc0e Compare March 30, 2024 01:14

Custom rrules and special batched backslash implementation for CUDA

92e957b

avik-pal force-pushed the ap/operator branch from afb536c to 91e1889 Compare April 1, 2024 19:29

Special case for handling CAs

256f0cc

avik-pal force-pushed the ap/operator branch from 91e1889 to 256f0cc Compare April 1, 2024 20:49

avik-pal merged commit 54b894e into main Apr 1, 2024
8 of 9 checks passed

avik-pal deleted the ap/operator branch April 1, 2024 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to an operator based implementation #11

Migrate to an operator based implementation #11

avik-pal commented Mar 29, 2024 •

edited

Loading

avik-pal commented Mar 30, 2024 •

edited

Loading

mohamed82008 commented Mar 30, 2024

mohamed82008 commented Mar 30, 2024 •

edited

Loading

avik-pal commented Mar 30, 2024

mohamed82008 commented Mar 30, 2024 •

edited

Loading

codecov bot commented Apr 1, 2024 •

edited

Loading

Migrate to an operator based implementation #11

Migrate to an operator based implementation #11

Conversation

avik-pal commented Mar 29, 2024 • edited Loading

TODOs

avik-pal commented Mar 30, 2024 • edited Loading

mohamed82008 commented Mar 30, 2024

mohamed82008 commented Mar 30, 2024 • edited Loading

avik-pal commented Mar 30, 2024

mohamed82008 commented Mar 30, 2024 • edited Loading

codecov bot commented Apr 1, 2024 • edited Loading

Codecov Report

avik-pal commented Mar 29, 2024 •

edited

Loading

avik-pal commented Mar 30, 2024 •

edited

Loading

mohamed82008 commented Mar 30, 2024 •

edited

Loading

mohamed82008 commented Mar 30, 2024 •

edited

Loading

codecov bot commented Apr 1, 2024 •

edited

Loading