Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Migrate to an operator based implementation #11

Merged
merged 6 commits into from
Apr 1, 2024
Merged

Migrate to an operator based implementation #11

merged 6 commits into from
Apr 1, 2024

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Mar 29, 2024

Fixes #10

TODOs

  • Overload the init_cacheval functions from LinearSolve
  • Fix the tests
  • Test with ManifoldNeuralODEs that there is no regression

@avik-pal
Copy link
Member Author

avik-pal commented Mar 30, 2024

Currently there is an issue with LazyArrays with CuArrays (introduced in SciML/LinearSolve.jl#484):

julia> ∂A_lazy
(-).((16-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) .* (1×16 transpose(::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) with eltype Float32)):
  -123.729     327.617   -62.0831    -90.923    -39.8437     182.524     123.199    -399.96      223.462     400.244    -239.278   -270.178    -280.733      598.985     63.9631   -172.205
    87.6835   -232.174    43.9968     64.4349    28.2363    -129.35      -87.3081    283.441    -158.362    -283.643     169.57     191.469     198.949     -424.486    -45.329     122.037
   114.568    -303.362    57.4868     84.1915    36.8939    -169.011    -114.078     370.348    -206.918    -370.612     221.562    250.175     259.949     -554.639    -59.2275    159.455
   -84.9289    224.88    -42.6146    -62.4107   -27.3492     125.286      84.5653   -274.537     153.387     274.732    -164.243   -185.454    -192.699      411.151     43.905    -118.203
   -30.3105     80.258   -15.2088    -22.2739    -9.76072     44.7138     30.1807    -97.9801     54.7426     98.0499    -58.617    -66.1869    -68.7726     146.736     15.6694    -42.1858
   -92.1372    243.967   -46.2315    -67.7077   -29.6705     135.92       91.7427   -297.838     166.406     298.05     -178.183   -201.194    -209.054      446.047     47.6314   -128.236
    86.1454   -228.102    43.225      63.3046    27.741     -127.081     -85.7766    278.47     -155.584    -278.668     166.596    188.11      195.459     -417.04     -44.5339    119.896
   107.607    -284.928    53.9935     79.0755    34.652     -158.74     -107.146     347.844    -194.344    -348.091     208.099    234.973     244.153     -520.936    -55.6285    149.766
 -1140.79     3020.66   -572.412    -838.319   -367.363     1682.89     1135.91    -3687.67     2060.34     3690.29    -2206.16   -2491.07    -2588.38      5522.7      589.745   -1587.74
 -1369.77     3626.96   -687.306   -1006.58    -441.099     2020.67     1363.9     -4427.85     2473.89     4431.0     -2648.98   -2991.07    -3107.92      6631.21     708.118   -1906.43
  -719.098    1904.07   -360.82     -528.434   -231.567     1060.81      716.019   -2324.52     1298.74     2326.17    -1390.65   -1570.25    -1631.59      3481.24     371.746   -1000.83
   337.419    -893.44    169.306     247.955    108.657     -497.758    -335.974    1090.72     -609.399   -1091.5       652.53     736.799     765.582    -1633.48    -174.433     469.616
   287.084    -760.159   144.049     210.966     92.4481    -423.504    -285.855     928.013    -518.491    -928.674     555.188    626.885     651.375    -1389.81    -148.411     399.561
   450.972   -1194.11    226.283     331.4      145.224     -665.27     -449.041    1457.79     -814.483   -1458.83      872.128    984.756    1023.23     -2183.21    -233.135     627.658
  2247.77    -5951.79   1127.86     1651.79     723.837    -3315.89    -2238.15     7266.03    -4059.61    -7271.2      4346.93    4908.3      5100.05    -10881.7    -1162.01     3128.43
  -408.027    1080.4    -204.735    -299.842   -131.395      601.918     406.28    -1318.97      736.922    1319.91     -789.078   -890.981    -925.788     1975.31     210.934    -567.888

which is returned by the adjoint code for linear solve, but I can't seem to figure out a way to materialize this array.

julia> CuMatrix(∂A_lazy)
ERROR: This object is not a GPU array
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] backend(::Type)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:225
  [3] backend(x::Matrix{Float32})
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:226
  [4] _copyto!
    @ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
  [5] materialize!
    @ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:32 [inlined]
  [6] materialize!
    @ ./broadcast.jl:911 [inlined]
  [7] _copyto!(::ArrayLayouts.DenseColumnMajor, ::LazyArrays.BroadcastLayout{typeof(-)}, dest::Matrix{Float32}, bc::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{}, Transpose{}}}}})
    @ LazyArrays ~/.julia/packages/LazyArrays/8MqhP/src/lazybroadcasting.jl:24
  [8] _copyto!
    @ ~/.julia/packages/ArrayLayouts/sP5Ce/src/ArrayLayouts.jl:259 [inlined]
  [9] copyto!
    @ ~/.julia/packages/ArrayLayouts/sP5Ce/src/ArrayLayouts.jl:261 [inlined]
 [10] copyto_axcheck!
    @ ./abstractarray.jl:1177 [inlined]
 [11] Matrix{Float32}(x::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})
    @ Base ./array.jl:673
 [12] Array
    @ ./boot.jl:501 [inlined]
 [13] convert
    @ ./array.jl:665 [inlined]
 [14] CuArray
    @ ~/.julia/packages/CUDA/htRwP/src/array.jl:419 [inlined]
 [15] CuArray
    @ ~/.julia/packages/CUDA/htRwP/src/array.jl:423 [inlined]
 [16] (CuArray{T, 2} where T)(x::LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})
    @ CUDA ~/.julia/packages/CUDA/htRwP/src/array.jl:431
 [17] top-level scope
    @ REPL[246]:1
 [18] top-level scope
    @ ~/.julia/packages/Infiltrator/TNlCu/src/Infiltrator.jl:798
 [19] top-level scope
    @ ~/.julia/packages/CUDA/htRwP/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.
julia> ∂A .= ∂A_lazy
ERROR: GPU compilation of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceMatrix{Float32, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{…}, Base.OneTo{…}}, typeof(identity), Tuple{Base.Broadcast.Extruded{…}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
      .x is of type LazyArrays.BroadcastMatrix{Float32, typeof(-), Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}} which is not isbits.
        .args is of type Tuple{LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}} which is not isbits.
          .1 is of type LazyArrays.BroadcastMatrix{Float32, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}} which is not isbits.
            .args is of type Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}} which is not isbits.
              .1 is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
                .data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
                  .rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.
                    .obj is of type CUDA.Mem.DeviceBuffer which is not isbits.
                    .finalizer is of type Any which is not isbits.
                    .count is of type Base.Threads.Atomic{Int64} which is not isbits.
              .2 is of type Transpose{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}} which is not isbits.
                .parent is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
                  .data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
                    .rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.

I don't see any GPU CI on LazyArrays, so it is unclear to me that it is supported properly in the first place.

@mohamed82008 or @ChrisRackauckas do you know how to proceed here?

@avik-pal avik-pal force-pushed the ap/operator branch 2 times, most recently from 4c2420c to de88368 Compare March 30, 2024 01:14
@mohamed82008
Copy link

Try collect. Didn't test it.

@mohamed82008
Copy link

mohamed82008 commented Mar 30, 2024

Alternatively, pre-allocate the result matrix and use copy! or broadcasting?

a = CuMatrix(..)
copy!(a, b)
# or
a .= b

assuming b is the lazy array

@avik-pal
Copy link
Member Author

Tried that see the broadcast compile error

@mohamed82008
Copy link

mohamed82008 commented Mar 30, 2024

If a is a BroadcastArray, try a.f.(a.args...)

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Attention: Patch coverage is 71.03275% with 115 lines in your changes are missing coverage. Please review.

Project coverage is 71.44%. Comparing base (d0ce078) to head (256f0cc).

Files Patch % Lines
src/operator.jl 64.42% 53 Missing ⚠️
ext/BatchedRoutinesLinearSolveExt.jl 78.88% 19 Missing ⚠️
src/factorization.jl 69.09% 17 Missing ⚠️
src/chainrules.jl 68.18% 14 Missing ⚠️
ext/BatchedRoutinesCUDAExt/factorization.jl 66.66% 7 Missing ⚠️
...xt/BatchedRoutinesComponentArraysForwardDiffExt.jl 0.00% 3 Missing ⚠️
src/BatchedRoutines.jl 66.66% 1 Missing ⚠️
src/api.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #11      +/-   ##
==========================================
+ Coverage   64.22%   71.44%   +7.22%     
==========================================
  Files          13       18       +5     
  Lines         763      872     +109     
==========================================
+ Hits          490      623     +133     
+ Misses        273      249      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avik-pal avik-pal merged commit 54b894e into main Apr 1, 2024
8 of 9 checks passed
@avik-pal avik-pal deleted the ap/operator branch April 1, 2024 21:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use an Operator Implementation for UniformBlockDiagonalMatrix
2 participants