-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster incremental sysimg rebuilds #40414
Conversation
c2d1343
to
86c959e
Compare
Amazing! Thank you for tackling this, @Keno! It sounds very exciting! In case you haven't seen this already, regarding speeding up PackageCompiler, I've asked in the past about why PackageCompiler currently ends up compiling everything to native code twice, where we round-trip through a text file to record the compilations: JuliaLang/PackageCompiler.jl#486. Just wanted to float this past your vision in case you hadn't seen it. Kristoffer provided good explanation, but it seems like something that could be improved with a bit of design work. |
This can basically address that as well. You can build the whole sysimg without precompiles, then dump out your precompiles with that and then use this mechanism to build a chained sysimg as before much faster. |
@timholy one application that comes to mind is speeding up development of Base itself. Could we have a mode of Revise where it serializes what needs to be Revised in some easy to load format, and then use this to quickly update an existing system image (without Revise itself showing up in the system image)? |
This is the second part of the plan described in #40414 (though complimentary to the PR itself). In particular, this PR makes it possible to quickly replace a system image during initial startup. This is done by adding a hook early in the startup sequence (after the system image, but before any dependent libraries are initialized) for Julia to look at the specified project file and decide to load a different sysimage instead. In the current version of the PR, this works as following: - If the `--autoload` argument is specified, julia will hash the contents of the currently active project's manifest path. - If a corresponding .so is found in `~/.julia/sysimages`, it will load that sysimage instead. - If not, loading will proceed as usual, a warning is generated but before any user code is run, Julia will `require` any dependencies specified in the Project.toml. The third point is there such that independent of whether or not the system image is found, the environment upon transfer of control to the user is always the same (e.g. a package may have type-pirated a method, which is available independent of whether the user ever explicitly did `using`). This is highly incomplete. In particular, these scheme to find the system image needs to take account of preferences and should probably exlcude any packages that are `dev`'ed (or their dependents). I'm not sure I'll have the time to get around to finishing this, but I'm hoping somebody else would be willing to jump in for that part. The underlying mechanism seems to work fine at this point, so this work should be mostly confined to loading.jl.
The multiversioning pass currently does two things: - Clone all functions and create a set of tables to tell the sysimage loader where to find the various cloned functions. - Compress the table of pointers by going from 64 bit pointers to 32 bit offsets from the first function of the .text section. The second optimization is useful, because it cuts down on space and speed up dynamic loading. Unfortunately relocations of this kind are not expressible in all object formats and as a result this scheme does not work if the table needs to describe function pointers in multiple compilation units. I'm working on improving the performance of incremental system image rebuilds which would rely on being able to re-link such system images and is thus incomptabile with this compression. There are possible ways, to make it compatible, namely: - Add a relocation to all the relevant file formats that expresses offsets from the start of the section, or, - Change the multiversion table to be pcrel rather than relative to the first function in the table. The first would require some signifcant coordination with standards bodies, and both are currently not supported in LLVM. To make progress on this issue, simply make the multiversion pass optional and keep the table uncompressed in this case. This wastes some space and adds a few fractions of a second to the system image load time, but it should let us proceed on the incremental sysimage project. If it works well, we can go back and consider the future of the multiversioning tables.
This commit provides the ability to rebuild system images much faster. The key observation is that most of the time in sysimage build is spent in LLVM generating native code (serializing julia's data structures is quite fast). Thus if we can re-use the code already generated for the system image we're currently running, we'll save a fair amount of time. Unfortunately, this is not 100% straightforward since we were assuming that no linking happens in a number of places. This PR hacks around that, but it is not a particularly satisfying long term solution. That said, it should work fine, and I think it's worth doing, so that we can explore the workflow adjustments that would rely on this. With that said, here's how to use this (at the low level, of course PkgCompiler would just handle this) ```shell $ mkdir chained $ time ./usr/bin/julia --sysimage-native-code=chained --sysimage=usr/lib/julia/sys.so --output-o chained/chained.o.a -e 'Base.__init_build();' real 0m9.633s user 0m8.613s sys 0m1.020s $ cp ../usr/lib/julia/sys-o.a . # Get the -o.a from the old sysimage $ ar x sys-o.a # Extract it into text.o and data.o $ rm data.o # rm the serialized sysimg data $ mv text.o text-old.o $ llvm-objcopy --remove-section .data.jl.unique text-old.o # rm the link between the native code and the old sysimg data $ ar x chained.o.a # Extract new sysimage files $ gcc -shared -o chained.so text.o data.o text-old.o # Link everything $ ../julia --sysimage=chained.so ``` As can be seen, regenerating the system image took about 9s (the subsequent commands aren't timed here, but take less than a second total). This compares very favorably with a non-chained sysimg rebuild: ``` time ./usr/bin/julia --sysimage=usr/lib/julia/sys.so --output-o nonchained.o.a -e 'Base.__init_build();' real 2m42.667s user 2m39.211s sys 0m3.452s ``` Of course if you do load additional packages, the extra code does still need to be compiled, so e.g. building a system image for `Plots` goes from 3 mins to 1 mins (building all of plots, plus everything in base that got invalidated). That is still all in LLVM though - it should be relatively straightforward to multithread that after this PR (since linking the sysimg in multiple pieces is allowed). That part is not implemented yet though.
Just noticed #40414 (comment). Sure, that would be pretty easy to do in principle. What exactly would it look like? File an issue at Revise when you want this; my impression is that we're not yet at a place where this will make a difference. |
I have to say, this triggers my love/hate relationship with https://github.com/JuliaLang/PackageCompiler.jl. I totally get why it's necessary to have it, but at the same time its existence is probably what's let us get away for so long without just implementing native-code caching in package .ji files. I'd rather just fix that. Are we really so far from that goal? It just doesn't seem like it should be all that insurmountable. I'm on a bit of a close-the-precompile-issues rampage right now. There really aren't that many issues per se, but we'll still need some things (the |
Alternatively, could we put the native code into shared libraries, and load them when we load .ji files? It could improve the situation of calling Julia from C, since we'd have an obvious place to emit |
I'm not really sure of the right implementation, mostly because I've never actually looked at the format of a shared library file. But that seems pretty sensible. Once we can cache external MethodInstances & CodeInstances (in our current no-native-code format), AFAICT the main remaining job is doing the work of the linker. If we can rely on external tools, that seems likely to be a win. |
@Keno is this the PR you said can be rebased and brought back? |
Faster incremental sysimg rebuilds
Recent improvements in precompilation have improved compile
time issues like ttfp quite significantly. However, it is
still significantly faster to just build a system image,
in which case ttfp is basically instant. The difference is
primarily due to us not being able to store native code in
.ji files as well as invalidations of previously loaded code
requiring recompilation. In the long term these issues can
be overcome, but in the short term, I think we should try
to leverage system images more heavily, since they already
basically solve the problem. I believe the reason people
aren't really using system images is three-fold
Thus, my evil plan to improve the situation is
autoload
annotation to Project.toml files.If present, julia will hash the manifest and look
for any matching system image in
~/.julia/sysimages
.The idea is that for the standard workflow where people
just use plain julia with the default environment or
julia --project
system images would be just loadedautomatically, thus reducing the barrier to entry.
In the initial version, there is no automatic rebuild
of these system images - they would still be built manually
with PkgCompiler, but at least the loading side would
be automatic and hopefully the build will be fast enough
that people will actually be willing to wait.
Eventually the rebuild could also be automatic
(maybe even in the background).
The major drawback of this plan is that system images will
start with all packages already loaded (even if their
bindings aren't present in
Main
). This will require someworkflow adjustments. I think it'll probably turn out fine,
but it's worth highlighting.
This PR is step 1 in this direction. It provides the ability
to rebuild system images much faster. The key observation
is that most of the time in sysimage build is spent in LLVM
generating native code (serializing julia's data structures
is quite fast). Thus if we can re-use the code already
generated for the system image we're currently running, we'll
save a fair amount of time.
Unfortunately, this is not 100% straightforward since we were
assuming that no linking happens in a number of places. This
PR hacks around that, but it is not a particularly satisfying
long term solution. That said, it should work fine, and I think
it's worth doing, so that we can explore the workflow
adjustments that would rely on this.
With that said, here's how to use this (at the low level, of
course PkgCompiler would just handle this)
As can be seen, regenerating the system image took about 9s (the
subsequent commands aren't timed here, but take less than a second total).
This compares very favorably with a non-chained sysimg rebuild:
Of course if you do load additional packages, the extra code
does still need to be compiled, so e.g. building a system image
for
Plots
goes from 3 mins to 1 mins (building all of plots,plus everything in base that got invalidated). That is still all in
LLVM though - it should be relatively straightforward to
multithread that after this PR (since linking the sysimg
in multiple pieces is allowed). That part is not implemented
yet though.