Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Cache #1849

Merged
merged 31 commits into from
Feb 8, 2023
Merged

No Cache #1849

merged 31 commits into from
Feb 8, 2023

Conversation

chriselrod
Copy link
Contributor

See commit message for more details.

Setting the precompilation preferences to false, I get this on the latest release:

julia> using OrdinaryDiffEq

julia> function lorenz(du, u, p, t)
         du[1] = 10.0(u[2] - u[1])
         du[2] = u[1] * (28.0 - u[3]) - u[2]
         du[3] = u[1] * u[2] - (8 / 3) * u[3]
       end
lorenz (generic function with 1 method)

julia> function lorenz_oop(u, p, t)
         [10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> solver = Vern9();

julia> prob = ODEProblem(lorenz, [1.0; 0.0; 0.0], (0.0, 1.0));

julia> @time @eval solve(prob, solver);
  5.120141 seconds (5.44 M allocations: 345.486 MiB, 6.31% gc time, 100.00% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  12.554 μs   2.393 ms  ┊ GC (min  max): 0.00%  95.20%
 Time  (median):     13.646 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.343 μs ± 59.249 μs  ┊ GC (mean ± σ):  9.86% ±  2.54%

            ▂▃▄▆▆▇███▇▇▅▄▂▁                                    
  ▁▁▁▁▂▃▄▅▇████████████████▇▆▅▅▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  12.6 μs         Histogram: frequency by time        16.1 μs <

 Memory estimate: 28.34 KiB, allocs estimate: 259.

Vs this on this branch:

julia> using OrdinaryDiffEq

julia> function lorenz(du, u, p, t)
         du[1] = 10.0(u[2] - u[1])
         du[2] = u[1] * (28.0 - u[3]) - u[2]
         du[3] = u[1] * u[2] - (8 / 3) * u[3]
       end
lorenz (generic function with 1 method)

julia> function lorenz_oop(u, p, t)
         [10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> solver = Vern9();

julia> prob = ODEProblem(lorenz, [1.0; 0.0; 0.0], (0.0, 1.0));

julia> @time @eval solve(prob, solver);
  2.909396 seconds (6.35 M allocations: 407.016 MiB, 12.35% gc time, 99.99% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  10.631 μs   2.528 ms  ┊ GC (min  max): 0.00%  98.38%
 Time  (median):     11.649 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.996 μs ± 54.907 μs  ┊ GC (mean ± σ):  9.32% ±  2.19%

         ▂▁▂▃▅▅█▇▇█▇▆▄▂                                        
  ▁▁▂▃▅▆█████████████████▆▅▄▄▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  10.6 μs         Histogram: frequency by time        14.4 μs <

 Memory estimate: 22.09 KiB, allocs estimate: 259.

Note the compile time improvement (which will more generally translate to shorter precompile times), plus the runtime improvement owing to fewer unnecessary copies.
We have the same number of allocations, but those allocations are smaller due to the decreased cachesize.

This PR implements the optimization for both Vern7 and Vern9.

structs are pass-by-value. While large structs are often passed by reference as
an optimization, in many cases they were passed by value instead,
causing a great deal of code to be emitted in order to copy the memory
around on the stack.
By inlining the caches to where they are used instead, we eliminate
the copies.

To minimize the amount of code we need to actually change, this commit
implements a macro that makes use of the old code with minimal
optimizations in order to define the associated variables. This
perhaps isn't the solution we want long-term, but it has the benefit
of allowing us to still take advantage of multiple dispatch in
defining the constants, so we can (for example) have `Float64` vs
`BigFloat` specializations.
Project.toml Outdated
@@ -48,7 +48,7 @@ ArrayInterfaceCore = "0.1.22"
ArrayInterfaceGPUArrays = "0.1, 0.2"
ArrayInterfaceStaticArrays = "0.1"
ArrayInterfaceStaticArraysCore = "0.1.2"
DataStructures = "0.18"
DataStructures = "0.18, 0.19"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this tested? Odd this hasn't come up in a CompatHelper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to dev DataStructures. This should be reverted. I eventually just changed the deved DataStructures version back to 0.18 instead.

Comment on lines +64 to +68
saveat = integrator.opts.saveat
while !isempty(saveat) && first(saveat) <= tdir_t # Perform saveat
integrator.saveiter += 1
saved = true
curt = integrator.tdir * pop!(integrator.opts.saveat)
curt = integrator.tdir * pop!(saveat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this doesn't make a major difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just did that while looking at where all the copies came from.
I thought, maybe it's thinking that stuff possibly gets mutated on each iter and is thus reloading on each call.
But I didn't notice any difference.

@chriselrod
Copy link
Contributor Author

Vern9 sees a big benefit for both ip and oop; latest release:

julia> using StaticArrays, OrdinaryDiffEq
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.

julia> function lorenz_oop(u, p, t)
         SA[10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> prob = ODEProblem(lorenz_oop, SA[1.0; 0.0; 0.0], (0.0, 1.0));

julia> solver = Vern9();

julia> @time @eval solve(prob, solver);
  2.181060 seconds (3.48 M allocations: 217.965 MiB, 11.79% gc time, 99.77% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min  max):  5.484 μs  208.429 μs  ┊ GC (min  max): 0.00%  93.22%
 Time  (median):     5.893 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.317 μs ±   7.779 μs  ┊ GC (mean ± σ):  5.10% ±  4.03%

       ▁▅▆▇▇██▆▃                                               
  ▁▁▁▃▆█████████▇▅▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃^G
  5.48 μs         Histogram: frequency by time         7.8 μs <

 Memory estimate: 15.22 KiB, allocs estimate: 38.

PR:

julia> using StaticArrays, OrdinaryDiffEq
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.

julia> function lorenz_oop(u, p, t)
         SA[10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
       end
lorenz_oop (generic function with 1 method)

julia> prob = ODEProblem(lorenz_oop, SA[1.0; 0.0; 0.0], (0.0, 1.0));

julia> solver = Vern9();

julia> @time @eval solve(prob, solver);
  1.870612 seconds (4.03 M allocations: 255.161 MiB, 15.23% gc time, 99.74% compilation time)

julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min  max):  3.833 μs  135.757 μs  ┊ GC (min  max): 0.00%  93.96%
 Time  (median):     3.990 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.199 μs ±   4.522 μs  ┊ GC (mean ± σ):  3.87% ±  3.49%

      ▁▅▆██▇▇▅▅▄▄▃▁                                            
  ▁▂▄▅██████████████▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  3.83 μs         Histogram: frequency by time        4.63 μs <

 Memory estimate: 9.00 KiB, allocs estimate: 38.

@chriselrod
Copy link
Contributor Author

chriselrod commented Jan 28, 2023

I should actually time precompilation, but it is notably faster on this PR.
Latest release

julia> info_cachefile("OrdinaryDiffEq")
Contents of /home/chriselrod/.julia/compiled/v1.10/OrdinaryDiffEq/DlSvy_lNMJ5.so:
  modules: Any[OrdinaryDiffEq]
  2871 external methods
  17484 new specializations of external methods (Base 52.9%, UnPack 15.2%, Base.Broadcast 14.2%, ...)
  844 external methods with new roots
  28776 external targets
  29269 edges
  file size:   143549408 (136.899 MiB)
  Segment sizes (bytes):
  system:      27307504 ( 23.69%)
  isbits:      84767781 ( 73.52%)
  symbols:       136012 (  0.12%)
  tags:          383679 (  0.33%)
  relocations:  2619334 (  2.27%)
  gvars:          22576 (  0.02%)
  fptrs:          56576 (  0.05%)

PR:

julia> info_cachefile("OrdinaryDiffEq")
Contents of /home/chriselrod/.julia/compiled/v1.10/OrdinaryDiffEq/DlSvy_hyQim.so:
  modules: Any[OrdinaryDiffEq]
  2871 external methods
  16375 new specializations of external methods (Base 53.6%, Base.Broadcast 15.2%, UnPack 12.3%, ...)
  844 external methods with new roots
  27667 external targets
  28637 edges
  file size:   140992328 (134.461 MiB)
  Segment sizes (bytes):
  system:      26685688 ( 23.48%)
  isbits:      83864397 ( 73.78%)
  symbols:       135365 (  0.12%)
  tags:          373715 (  0.33%)
  relocations:  2536109 (  2.23%)
  gvars:          22472 (  0.02%)
  fptrs:          56576 (  0.05%)

Note UnPack's decrease, and the smaller image size (2.4 MiB).
Applying this to more methods should further, particularly the ones being precompiled.

@chriselrod
Copy link
Contributor Author

chriselrod commented Jan 28, 2023

Naively, this failure looks real? https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769906/jobs/6931258704
EDIT: It passed for me locally.

@unpack r11, r12, r13, r14, r22, r23, r24, r32, r33, r34, r42, r43, r44, r52, r53, r54, r62, r63, r64, r72, r73, r74 = cache
end end
@def tsit5unpack begin
var"#T#" = recursive_unitless_bottom_eltype(y₁)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the types are all missing the constvalue part which is crucial for dropping units, ADness, and complex-ness

@ChrisRackauckas
Copy link
Member

@chriselrod
Copy link
Contributor Author

chriselrod commented Jan 29, 2023

https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769902/jobs/6931259103#step:6:734 that looks like a real error

Yes, that's from abusing @generated.

If we only care about Julia 1.8 or newer, we could use Base.@assume_effects :foldable instead of @generated.

Aside from dropping support for older Julia versions or letting older versions be slow, we could direct more methods to the non-@generated version, e.g. direct all isbits objects there, as we may assume the compiler can constant fold those on its own.

@oscardssmith
Copy link
Contributor

@ChrisRackauckas can you explain the theory behind this PR? I feel very out of the loop here. What is Base.Experimental.silence!, and why so many @generated functions?

@chriselrod
Copy link
Contributor Author

chriselrod commented Jan 29, 2023

Hmm, I guess these record history?

julia> Base.isbitstype(Tracker.TrackedReal{Float32})
false

If they're properly mutable, then we do actually need to allocate them fresh.
But if we were caching them before, then caching them must be correct? I guess they only store their own history, and aren't mutated.

@chriselrod
Copy link
Contributor Author

chriselrod commented Jan 29, 2023

can you explain the theory behind this PR?

@oscardssmith, LLVM spent all its time compiling the copying of constants like tableaus for Vern9.
So instead of caching the creation of structs holding these constants and copying them around, we just inline the constants to where they are used.
@generated was to cache the tableau results.

Base.silence! is because I accidentally built this off the wrong branch. I meant to build this PR off of master.
It's taking advantage of a feature added in this Julia branch: https://github.com/JuliaLang/julia/tree/jb/silentparams
which doesn'tzz have an associated PR yet.

@ChrisRackauckas
Copy link
Member

What is Base.Experimental.silence!

It's Jeff's new experimental stack trace silencing feature. It should probably be split to a separate PR, but it's fine because it's not even in a Base PR yet.

@ChrisRackauckas
Copy link
Member

If they're properly mutable, then we do actually need to allocate them fresh.

None of the coefficients should be tracked values though, that should be handled by the constvalue function which should call value. The issue here is likely that https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl#L15-L18 needs to be made into an extension in DiffEqBase instead.

@chriselrod
Copy link
Contributor Author

chriselrod commented Jan 29, 2023

If they're properly mutable, then we do actually need to allocate them fresh.

None of the coefficients should be tracked values though, that should be handled by the constvalue function which should call value. The issue here is likely that https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl#L15-L18 needs to be made into an extension in DiffEqBase instead.

Okay, so these tableau functions should never be called with TrackedReal.

We should still probably switch to Base.@assume_effects :foldable in Julia >= 1.8, only using some more hacky thing for older Julia versions.

@ChrisRackauckas
Copy link
Member

SciMLSensitivity.jl in general should be loaded any time an AD package is loaded, but the circular dependency issue makes that hard. That's what I posted here:

https://discourse.julialang.org/t/package-extensions-for-julia-1-9/93397/12?u=chrisrackauckas

Making the following:

https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/zygote.jl
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/reversediff.jl

into DiffEqBase extension packages would probably be a quick fix for now.

@chriselrod
Copy link
Contributor Author

All the AD backends are fairly intermingled, e.g. automatic_sensealg_choice or inplace_vjp.

@ChrisRackauckas
Copy link
Member

That's not what I meant 😅 . I meant loading the whole SciMLSensitivity if any reverse mode AD is loaded.

@chriselrod
Copy link
Contributor Author

Oh, sorry, that makes way more sense.

I guess that requires a major version bump on DiffEqBase to avoid overwriting the methods?

If it weren't for this PR, what I'd have done is leave the code in SciMLSensitivity (while copying it into DiffEqBase), but only load it if !isdefined(Base, :get_extension).

This way in Julia versions without get_extension, nothing changes.
In Julia >= 1.9, it'd be an extension package.

Of course, that doesn't fix this particular issue, as this issue is only in Julia <= 1.7, where Base.@constprop isn't defined.

@ChrisRackauckas
Copy link
Member

I guess that requires a major version bump on DiffEqBase to avoid overwriting the methods?

If they are the same methods it's not breaking, you just get a warning, so I think it could be fine?

@chriselrod
Copy link
Contributor Author

Unfortunately, that isn't currently legal.
I.e., we can't have load OrdinaryDiffEq/ext/SciMLSensitivityExt.jl that does using SciMLSensitivity in response to one of SciMLSensitivityExt's dependencies getting loaded.
Pkg doesn't know how to handle the cycle ReverseDiff -> load SciMLSensitivityExt -> load SciMLSensitivity -> load ReverseDiff -> load SciMLSensitivityExt -> errors on cycle detection.

@ChrisRackauckas
Copy link
Member

SciML/SciMLSensitivity.jl#780 is what needs to get finished to complete this. We can worry about automatically bringing in SciMLSensitivity later, but with this at least the correct errors are always thrown and the piracy is eliminated.

@ChrisRackauckas
Copy link
Member

Okay, fingers crossed it may work this time.

@ChrisRackauckas ChrisRackauckas merged commit 25d0fb3 into SciML:master Feb 8, 2023
@ChrisRackauckas
Copy link
Member

Now fix your fork lol

@chriselrod chriselrod deleted the horriblehackextract branch February 8, 2023 21:56
@chriselrod
Copy link
Contributor Author

Now fix your fork lol

Done:
https://github.com/chriselrod/OrdinaryDiffEq.jl/pulls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants