No Cache #1849

chriselrod · 2023-01-27T17:16:10Z

See commit message for more details.

Setting the precompilation preferences to false, I get this on the latest release:

julia> using OrdinaryDiffEq

julia> function lorenz(du, u, p, t)
         du[1] = 10.0(u[2] - u[1])
         du[2] = u[1] * (28.0 - u[3]) - u[2]
         du[3] = u[1] * u[2] - (8 / 3) * u[3]
       end
lorenz (generic function with 1 method)

julia> function lorenz_oop(u, p, t)
         [10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> solver = Vern9();

julia> prob = ODEProblem(lorenz, [1.0; 0.0; 0.0], (0.0, 1.0));

julia> @time @eval solve(prob, solver);
  5.120141 seconds (5.44 M allocations: 345.486 MiB, 6.31% gc time, 100.00% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  12.554 μs …  2.393 ms  ┊ GC (min … max): 0.00% … 95.20%
 Time  (median):     13.646 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.343 μs ± 59.249 μs  ┊ GC (mean ± σ):  9.86% ±  2.54%

            ▂▃▄▆▆▇███▇▇▅▄▂▁                                    
  ▁▁▁▁▂▃▄▅▇████████████████▇▆▅▅▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  12.6 μs         Histogram: frequency by time        16.1 μs <

 Memory estimate: 28.34 KiB, allocs estimate: 259.

Vs this on this branch:

julia> using OrdinaryDiffEq

julia> function lorenz(du, u, p, t)
         du[1] = 10.0(u[2] - u[1])
         du[2] = u[1] * (28.0 - u[3]) - u[2]
         du[3] = u[1] * u[2] - (8 / 3) * u[3]
       end
lorenz (generic function with 1 method)

julia> function lorenz_oop(u, p, t)
         [10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> solver = Vern9();

julia> prob = ODEProblem(lorenz, [1.0; 0.0; 0.0], (0.0, 1.0));

julia> @time @eval solve(prob, solver);
  2.909396 seconds (6.35 M allocations: 407.016 MiB, 12.35% gc time, 99.99% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  10.631 μs …  2.528 ms  ┊ GC (min … max): 0.00% … 98.38%
 Time  (median):     11.649 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.996 μs ± 54.907 μs  ┊ GC (mean ± σ):  9.32% ±  2.19%

         ▂▁▂▃▅▅█▇▇█▇▆▄▂                                        
  ▁▁▂▃▅▆█████████████████▆▅▄▄▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  10.6 μs         Histogram: frequency by time        14.4 μs <

 Memory estimate: 22.09 KiB, allocs estimate: 259.

Note the compile time improvement (which will more generally translate to shorter precompile times), plus the runtime improvement owing to fewer unnecessary copies.
We have the same number of allocations, but those allocations are smaller due to the decreased cachesize.

This PR implements the optimization for both Vern7 and Vern9.

structs are pass-by-value. While large structs are often passed by reference as an optimization, in many cases they were passed by value instead, causing a great deal of code to be emitted in order to copy the memory around on the stack. By inlining the caches to where they are used instead, we eliminate the copies. To minimize the amount of code we need to actually change, this commit implements a macro that makes use of the old code with minimal optimizations in order to define the associated variables. This perhaps isn't the solution we want long-term, but it has the benefit of allowing us to still take advantage of multiple dispatch in defining the constants, so we can (for example) have `Float64` vs `BigFloat` specializations.

src/caches/verner_caches.jl

ChrisRackauckas · 2023-01-28T08:08:27Z

Project.toml

@@ -48,7 +48,7 @@ ArrayInterfaceCore = "0.1.22"
 ArrayInterfaceGPUArrays = "0.1, 0.2"
 ArrayInterfaceStaticArrays = "0.1"
 ArrayInterfaceStaticArraysCore = "0.1.2"
-DataStructures = "0.18"
+DataStructures = "0.18, 0.19"


was this tested? Odd this hasn't come up in a CompatHelper?

I was trying to dev DataStructures. This should be reverted. I eventually just changed the deved DataStructures version back to 0.18 instead.

ChrisRackauckas · 2023-01-28T08:10:39Z

src/integrators/integrator_utils.jl

+    saveat = integrator.opts.saveat
+    while !isempty(saveat) && first(saveat) <= tdir_t # Perform saveat
        integrator.saveiter += 1
        saved = true
-        curt = integrator.tdir * pop!(integrator.opts.saveat)
+        curt = integrator.tdir * pop!(saveat)


I assume this doesn't make a major difference?

Yeah, I just did that while looking at where all the copies came from.
I thought, maybe it's thinking that stuff possibly gets mutated on each iter and is thus reloading on each call.
But I didn't notice any difference.

src/tableaus/verner_tableaus.jl

chriselrod · 2023-01-28T11:19:59Z

Vern9 sees a big benefit for both ip and oop; latest release:

julia> using StaticArrays, OrdinaryDiffEq
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.

julia> function lorenz_oop(u, p, t)
         SA[10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> prob = ODEProblem(lorenz_oop, SA[1.0; 0.0; 0.0], (0.0, 1.0));

julia> solver = Vern9();

julia> @time @eval solve(prob, solver);
  2.181060 seconds (3.48 M allocations: 217.965 MiB, 11.79% gc time, 99.77% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.484 μs … 208.429 μs  ┊ GC (min … max): 0.00% … 93.22%
 Time  (median):     5.893 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.317 μs ±   7.779 μs  ┊ GC (mean ± σ):  5.10% ±  4.03%

       ▁▅▆▇▇██▆▃                                               
  ▁▁▁▃▆█████████▇▅▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃^G
  5.48 μs         Histogram: frequency by time         7.8 μs <

 Memory estimate: 15.22 KiB, allocs estimate: 38.

PR:

julia> using StaticArrays, OrdinaryDiffEq
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.

julia> function lorenz_oop(u, p, t)
         SA[10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> prob = ODEProblem(lorenz_oop, SA[1.0; 0.0; 0.0], (0.0, 1.0));

julia> solver = Vern9();

julia> @time @eval solve(prob, solver);
  1.870612 seconds (4.03 M allocations: 255.161 MiB, 15.23% gc time, 99.74% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.833 μs … 135.757 μs  ┊ GC (min … max): 0.00% … 93.96%
 Time  (median):     3.990 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.199 μs ±   4.522 μs  ┊ GC (mean ± σ):  3.87% ±  3.49%

      ▁▅▆██▇▇▅▅▄▄▃▁                                            
  ▁▂▄▅██████████████▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  3.83 μs         Histogram: frequency by time        4.63 μs <

 Memory estimate: 9.00 KiB, allocs estimate: 38.

chriselrod · 2023-01-28T12:39:37Z

I should actually time precompilation, but it is notably faster on this PR.
Latest release

julia> info_cachefile("OrdinaryDiffEq")
Contents of /home/chriselrod/.julia/compiled/v1.10/OrdinaryDiffEq/DlSvy_lNMJ5.so:
  modules: Any[OrdinaryDiffEq]
  2871 external methods
  17484 new specializations of external methods (Base 52.9%, UnPack 15.2%, Base.Broadcast 14.2%, ...)
  844 external methods with new roots
  28776 external targets
  29269 edges
  file size:   143549408 (136.899 MiB)
  Segment sizes (bytes):
  system:      27307504 ( 23.69%)
  isbits:      84767781 ( 73.52%)
  symbols:       136012 (  0.12%)
  tags:          383679 (  0.33%)
  relocations:  2619334 (  2.27%)
  gvars:          22576 (  0.02%)
  fptrs:          56576 (  0.05%)

PR:

julia> info_cachefile("OrdinaryDiffEq")
Contents of /home/chriselrod/.julia/compiled/v1.10/OrdinaryDiffEq/DlSvy_hyQim.so:
  modules: Any[OrdinaryDiffEq]
  2871 external methods
  16375 new specializations of external methods (Base 53.6%, Base.Broadcast 15.2%, UnPack 12.3%, ...)
  844 external methods with new roots
  27667 external targets
  28637 edges
  file size:   140992328 (134.461 MiB)
  Segment sizes (bytes):
  system:      26685688 ( 23.48%)
  isbits:      83864397 ( 73.78%)
  symbols:       135365 (  0.12%)
  tags:          373715 (  0.33%)
  relocations:  2536109 (  2.23%)
  gvars:          22472 (  0.02%)
  fptrs:          56576 (  0.05%)

Note UnPack's decrease, and the smaller image size (2.4 MiB).
Applying this to more methods should further, particularly the ones being precompiled.

chriselrod · 2023-01-28T13:05:43Z

Naively, this failure looks real? https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769906/jobs/6931258704
EDIT: It passed for me locally.

ChrisRackauckas · 2023-01-28T22:32:35Z

src/dense/interpolants.jl

-    @unpack r11, r12, r13, r14, r22, r23, r24, r32, r33, r34, r42, r43, r44, r52, r53, r54, r62, r63, r64, r72, r73, r74 = cache
-end end
+@def tsit5unpack begin
+    var"#T#" = recursive_unitless_bottom_eltype(y₁)


I think the types are all missing the constvalue part which is crucial for dropping units, ADness, and complex-ness

ChrisRackauckas · 2023-01-28T22:45:08Z

https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769902/jobs/6931259103#step:6:734 that looks like a real error

chriselrod · 2023-01-29T20:54:24Z

https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769902/jobs/6931259103#step:6:734 that looks like a real error

Yes, that's from abusing @generated.

If we only care about Julia 1.8 or newer, we could use Base.@assume_effects :foldable instead of @generated.

Aside from dropping support for older Julia versions or letting older versions be slow, we could direct more methods to the non-@generated version, e.g. direct all isbits objects there, as we may assume the compiler can constant fold those on its own.

oscardssmith · 2023-01-29T21:02:31Z

@ChrisRackauckas can you explain the theory behind this PR? I feel very out of the loop here. What is Base.Experimental.silence!, and why so many @generated functions?

chriselrod · 2023-01-29T21:03:06Z

Hmm, I guess these record history?

julia> Base.isbitstype(Tracker.TrackedReal{Float32})
false

If they're properly mutable, then we do actually need to allocate them fresh.
But if we were caching them before, then caching them must be correct? I guess they only store their own history, and aren't mutated.

chriselrod · 2023-01-29T21:04:34Z

can you explain the theory behind this PR?

@oscardssmith, LLVM spent all its time compiling the copying of constants like tableaus for Vern9.
So instead of caching the creation of structs holding these constants and copying them around, we just inline the constants to where they are used.
@generated was to cache the tableau results.

Base.silence! is because I accidentally built this off the wrong branch. I meant to build this PR off of master.
It's taking advantage of a feature added in this Julia branch: https://github.com/JuliaLang/julia/tree/jb/silentparams
which doesn'tzz have an associated PR yet.

ChrisRackauckas · 2023-01-29T21:06:20Z

What is Base.Experimental.silence!

It's Jeff's new experimental stack trace silencing feature. It should probably be split to a separate PR, but it's fine because it's not even in a Base PR yet.

ChrisRackauckas · 2023-01-29T21:07:42Z

If they're properly mutable, then we do actually need to allocate them fresh.

None of the coefficients should be tracked values though, that should be handled by the constvalue function which should call value. The issue here is likely that https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl#L15-L18 needs to be made into an extension in DiffEqBase instead.

chriselrod · 2023-01-29T21:09:32Z

If they're properly mutable, then we do actually need to allocate them fresh.

None of the coefficients should be tracked values though, that should be handled by the constvalue function which should call value. The issue here is likely that https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl#L15-L18 needs to be made into an extension in DiffEqBase instead.

Okay, so these tableau functions should never be called with TrackedReal.

We should still probably switch to Base.@assume_effects :foldable in Julia >= 1.8, only using some more hacky thing for older Julia versions.

ChrisRackauckas · 2023-01-29T21:10:01Z

SciMLSensitivity.jl in general should be loaded any time an AD package is loaded, but the circular dependency issue makes that hard. That's what I posted here:

https://discourse.julialang.org/t/package-extensions-for-julia-1-9/93397/12?u=chrisrackauckas

Making the following:

https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/zygote.jl
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/reversediff.jl

into DiffEqBase extension packages would probably be a quick fix for now.

chriselrod · 2023-01-31T20:14:57Z

All the AD backends are fairly intermingled, e.g. automatic_sensealg_choice or inplace_vjp.

ChrisRackauckas · 2023-01-31T20:17:55Z

That's not what I meant 😅 . I meant loading the whole SciMLSensitivity if any reverse mode AD is loaded.

chriselrod · 2023-01-31T20:29:51Z

Oh, sorry, that makes way more sense.

I guess that requires a major version bump on DiffEqBase to avoid overwriting the methods?

If it weren't for this PR, what I'd have done is leave the code in SciMLSensitivity (while copying it into DiffEqBase), but only load it if !isdefined(Base, :get_extension).

This way in Julia versions without get_extension, nothing changes.
In Julia >= 1.9, it'd be an extension package.

Of course, that doesn't fix this particular issue, as this issue is only in Julia <= 1.7, where Base.@constprop isn't defined.

ChrisRackauckas · 2023-02-05T21:50:41Z

I guess that requires a major version bump on DiffEqBase to avoid overwriting the methods?

If they are the same methods it's not breaking, you just get a warning, so I think it could be fine?

chriselrod · 2023-02-05T21:54:53Z

Unfortunately, that isn't currently legal.
I.e., we can't have load OrdinaryDiffEq/ext/SciMLSensitivityExt.jl that does using SciMLSensitivity in response to one of SciMLSensitivityExt's dependencies getting loaded.
Pkg doesn't know how to handle the cycle ReverseDiff -> load SciMLSensitivityExt -> load SciMLSensitivity -> load ReverseDiff -> load SciMLSensitivityExt -> errors on cycle detection.

ChrisRackauckas · 2023-02-06T18:43:16Z

SciML/SciMLSensitivity.jl#780 is what needs to get finished to complete this. We can worry about automatically bringing in SciMLSensitivity later, but with this at least the correct errors are always thrown and the piracy is eliminated.

ChrisRackauckas · 2023-02-08T18:11:01Z

Okay, fingers crossed it may work this time.

ChrisRackauckas · 2023-02-08T21:16:28Z

Now fix your fork lol

chriselrod · 2023-02-08T21:58:24Z

Now fix your fork lol

Done:
https://github.com/chriselrod/OrdinaryDiffEq.jl/pulls

chriselrod added 4 commits January 27, 2023 12:07

Fix interpolants

009dd63

Use correct T2 definition

a0245e2

Format

ab30a34

ChrisRackauckas reviewed Jan 28, 2023

View reviewed changes

src/caches/verner_caches.jl Outdated Show resolved Hide resolved