-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No Cache #1849
No Cache #1849
Conversation
structs are pass-by-value. While large structs are often passed by reference as an optimization, in many cases they were passed by value instead, causing a great deal of code to be emitted in order to copy the memory around on the stack. By inlining the caches to where they are used instead, we eliminate the copies. To minimize the amount of code we need to actually change, this commit implements a macro that makes use of the old code with minimal optimizations in order to define the associated variables. This perhaps isn't the solution we want long-term, but it has the benefit of allowing us to still take advantage of multiple dispatch in defining the constants, so we can (for example) have `Float64` vs `BigFloat` specializations.
Project.toml
Outdated
@@ -48,7 +48,7 @@ ArrayInterfaceCore = "0.1.22" | |||
ArrayInterfaceGPUArrays = "0.1, 0.2" | |||
ArrayInterfaceStaticArrays = "0.1" | |||
ArrayInterfaceStaticArraysCore = "0.1.2" | |||
DataStructures = "0.18" | |||
DataStructures = "0.18, 0.19" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this tested? Odd this hasn't come up in a CompatHelper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to dev DataStructures. This should be reverted. I eventually just changed the deved DataStructures version back to 0.18 instead.
saveat = integrator.opts.saveat | ||
while !isempty(saveat) && first(saveat) <= tdir_t # Perform saveat | ||
integrator.saveiter += 1 | ||
saved = true | ||
curt = integrator.tdir * pop!(integrator.opts.saveat) | ||
curt = integrator.tdir * pop!(saveat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this doesn't make a major difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just did that while looking at where all the copies came from.
I thought, maybe it's thinking that stuff possibly gets mutated on each iter and is thus reloading on each call.
But I didn't notice any difference.
Vern9 sees a big benefit for both ip and oop; latest release: julia> using StaticArrays, OrdinaryDiffEq
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.
julia> function lorenz_oop(u, p, t)
SA[10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
end
lorenz_oop (generic function with 1 method)
julia> prob = ODEProblem(lorenz_oop, SA[1.0; 0.0; 0.0], (0.0, 1.0));
julia> solver = Vern9();
julia> @time @eval solve(prob, solver);
2.181060 seconds (3.48 M allocations: 217.965 MiB, 11.79% gc time, 99.77% compilation time)
julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
Range (min … max): 5.484 μs … 208.429 μs ┊ GC (min … max): 0.00% … 93.22%
Time (median): 5.893 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.317 μs ± 7.779 μs ┊ GC (mean ± σ): 5.10% ± 4.03%
▁▅▆▇▇██▆▃
▁▁▁▃▆█████████▇▅▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃^G
5.48 μs Histogram: frequency by time 7.8 μs <
Memory estimate: 15.22 KiB, allocs estimate: 38. PR: julia> using StaticArrays, OrdinaryDiffEq
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.
julia> function lorenz_oop(u, p, t)
SA[10.0(u[2] - u[1]), u[1] * (28.0 - u[3]) - u[2], u[1] * u[2] - (8 / 3) * u[3]]
end
lorenz_oop (generic function with 1 method)
julia> prob = ODEProblem(lorenz_oop, SA[1.0; 0.0; 0.0], (0.0, 1.0));
julia> solver = Vern9();
julia> @time @eval solve(prob, solver);
1.870612 seconds (4.03 M allocations: 255.161 MiB, 15.23% gc time, 99.74% compilation time)
julia> @benchmark solve($prob, $solver)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.833 μs … 135.757 μs ┊ GC (min … max): 0.00% … 93.96%
Time (median): 3.990 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.199 μs ± 4.522 μs ┊ GC (mean ± σ): 3.87% ± 3.49%
▁▅▆██▇▇▅▅▄▄▃▁
▁▂▄▅██████████████▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁ ▃
3.83 μs Histogram: frequency by time 4.63 μs <
Memory estimate: 9.00 KiB, allocs estimate: 38. |
I should actually time precompilation, but it is notably faster on this PR. julia> info_cachefile("OrdinaryDiffEq")
Contents of /home/chriselrod/.julia/compiled/v1.10/OrdinaryDiffEq/DlSvy_lNMJ5.so:
modules: Any[OrdinaryDiffEq]
2871 external methods
17484 new specializations of external methods (Base 52.9%, UnPack 15.2%, Base.Broadcast 14.2%, ...)
844 external methods with new roots
28776 external targets
29269 edges
file size: 143549408 (136.899 MiB)
Segment sizes (bytes):
system: 27307504 ( 23.69%)
isbits: 84767781 ( 73.52%)
symbols: 136012 ( 0.12%)
tags: 383679 ( 0.33%)
relocations: 2619334 ( 2.27%)
gvars: 22576 ( 0.02%)
fptrs: 56576 ( 0.05%) PR: julia> info_cachefile("OrdinaryDiffEq")
Contents of /home/chriselrod/.julia/compiled/v1.10/OrdinaryDiffEq/DlSvy_hyQim.so:
modules: Any[OrdinaryDiffEq]
2871 external methods
16375 new specializations of external methods (Base 53.6%, Base.Broadcast 15.2%, UnPack 12.3%, ...)
844 external methods with new roots
27667 external targets
28637 edges
file size: 140992328 (134.461 MiB)
Segment sizes (bytes):
system: 26685688 ( 23.48%)
isbits: 83864397 ( 73.78%)
symbols: 135365 ( 0.12%)
tags: 373715 ( 0.33%)
relocations: 2536109 ( 2.23%)
gvars: 22472 ( 0.02%)
fptrs: 56576 ( 0.05%) Note |
Naively, this failure looks real? https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769906/jobs/6931258704 |
src/dense/interpolants.jl
Outdated
@unpack r11, r12, r13, r14, r22, r23, r24, r32, r33, r34, r42, r43, r44, r52, r53, r54, r62, r63, r64, r72, r73, r74 = cache | ||
end end | ||
@def tsit5unpack begin | ||
var"#T#" = recursive_unitless_bottom_eltype(y₁) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the types are all missing the constvalue
part which is crucial for dropping units, ADness, and complex-ness
https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/4031769902/jobs/6931259103#step:6:734 that looks like a real error |
Yes, that's from abusing If we only care about Julia 1.8 or newer, we could use Aside from dropping support for older Julia versions or letting older versions be slow, we could direct more methods to the non- |
@ChrisRackauckas can you explain the theory behind this PR? I feel very out of the loop here. What is |
Hmm, I guess these record history? julia> Base.isbitstype(Tracker.TrackedReal{Float32})
false If they're properly mutable, then we do actually need to allocate them fresh. |
@oscardssmith, LLVM spent all its time compiling the copying of constants like tableaus for Vern9.
|
It's Jeff's new experimental stack trace silencing feature. It should probably be split to a separate PR, but it's fine because it's not even in a Base PR yet. |
None of the coefficients should be tracked values though, that should be handled by the |
Okay, so these tableau functions should never be called with We should still probably switch to |
SciMLSensitivity.jl in general should be loaded any time an AD package is loaded, but the circular dependency issue makes that hard. That's what I posted here: https://discourse.julialang.org/t/package-extensions-for-julia-1-9/93397/12?u=chrisrackauckas Making the following: https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/tracker.jl into DiffEqBase extension packages would probably be a quick fix for now. |
All the AD backends are fairly intermingled, e.g. |
That's not what I meant 😅 . I meant loading the whole SciMLSensitivity if any reverse mode AD is loaded. |
Oh, sorry, that makes way more sense. I guess that requires a major version bump on DiffEqBase to avoid overwriting the methods? If it weren't for this PR, what I'd have done is leave the code in SciMLSensitivity (while copying it into DiffEqBase), but only load it if This way in Julia versions without Of course, that doesn't fix this particular issue, as this issue is only in Julia <= 1.7, where |
If they are the same methods it's not breaking, you just get a warning, so I think it could be fine? |
Unfortunately, that isn't currently legal. |
SciML/SciMLSensitivity.jl#780 is what needs to get finished to complete this. We can worry about automatically bringing in SciMLSensitivity later, but with this at least the correct errors are always thrown and the piracy is eliminated. |
Okay, fingers crossed it may work this time. |
Now fix your fork lol |
|
See commit message for more details.
Setting the precompilation preferences to false, I get this on the latest release:
Vs this on this branch:
Note the compile time improvement (which will more generally translate to shorter precompile times), plus the runtime improvement owing to fewer unnecessary copies.
We have the same number of allocations, but those allocations are smaller due to the decreased cachesize.
This PR implements the optimization for both Vern7 and Vern9.