-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: static compile part 4 (user-interface) #8745
Conversation
note, it will perhaps be necessary to protect against semi-malicious users when trying to derive file paths:
|
In this scheme, what are the chances of the caching being fully automatic? Package authors selectively enabling the functionality might be ok, but anything more than that I'm not sure. |
some things that are very likely to break this are quite easy to detect:
but consider the following code: module A
mycount = 0
register() = global mycount += 1
end
module B
A.register()
const counterwas = A.mycount
end unless the author explicitly marks B as "cache-able", I don't know how you would know whether to re-run |
8abce49
to
41e7a9b
Compare
A couple of months back I posted my experience with using Julia in a web services context: At this point I'm checking back to see what progress has been towards fixing this. Question 1: Is there a single "master issue" in github for the startup time problem? As an outsider I've found it frustrating to try to find where the dev's are discussing a particular issue. It seems that the core devs don't use the julia-dev list to discuss day to day progress. I can see why it's better in a lot of ways for this discussion to happen in pull-request comments and issue comments. However, for an outsider it makes it really hard to find "what's been going on with startup time recently". Question 2: Does the #8745 scheme address dynamic startup code in libraries? Words below snipped from julia-dev post: https://groups.google.com/d/msg/julia-dev/E3LjK65jH6Y/XwiD4RoFLiQJ ... Julia has nice dynamic language features, so it is tempting to do things dynamically at startup. e.g.
It seems to me there will need to be some mechanism whereby the compiler can be sure that a particular piece of start-up code is statically deterministic and can therefore safely be pre-executed and serialised. Maybe this can be inferred by the compiler, or maybe there needs to be a keyword. If Nettle.jl startup depends on the version or config of the installed libnettle.so, then it presumably can't be statically pre executed and serialised for fast startup. Perhaps it would make sense to have declared dependancies for startup code. e.g. The cached Nettle.jl binary can be used unless the hash of libnettle.so has changed... |
@samoconnor, click on the "Watch" button above, and be prepared for a lot of email. |
@samoconnor, there is already a mechanism for runtime initialization: you define an @vtjnash, unless we can statically determine whether a module is precompilable (which seems hard), it might be a good idea to precompile only modules that contain an |
@stevengj, to me that sounds like a good solution---it makes it opt-in. Of course there's the chance that a change might make the system suddenly not precompilable, but presumably that would show up in tests. |
that sounds like a great heuristic. the trouble is that we want to take some extra precautions when generating a cache version (sandboxing the run), which means that the ability to static compile needs to be something that can be determined ahead-of-time. the user interface for this thing is really the hardest part of this change to design. otherwise, the flexibility of the julia parser (with macros especially) makes it impossible to know what is able to be cached. sometimes I wonder if it might be worth forcing all modules to accept pre-compilation – they will just fail really fast if they weren't coded to expect it (explicit pointers are converted to NULL when serialized) |
Couldn't a very simplified version of the parser detect whether an I'm concerned that it will make Julia much more challenging for newbies if we force this degree of discipline on them as soon as they make a module. A lot of my students have no formal training in computer science, have never dealt with a compiled language, and don't know what a pointer is. |
Yes, that should work. Another way to hide the |
We could call it |
I think it would be fine to require that |
Detecting the function call is somewhat easy, but determining what file is going to be found is potentially brittle |
41e7a9b
to
da2f13c
Compare
i didn't mean to be silent on this PR for so long. Here's a (functional) sample demonstrating my latest progress: $ ./julia -J usr/lib/julia/Base.ji --build usr/lib/julia2 <<EOF
using Compat
EOF
$ ./julia -J usr/lib/julia/Base.ji --build usr/lib/julia2 <<EOF
using FixedPointNumbers
@time C = ccall(:jl_restore_new_module, Any, (Ptr{Uint8},), "usr/lib/julia2/Compat.ji")
using Color
using Cairo
using Gtk
EOF
$ ./julia -q
julia> @time begin
FP = ccall(:jl_restore_new_module, Any,
(Ptr{Uint8},), "usr/lib/julia2/FixedPointNumbers.ji")
C0 = ccall(:jl_restore_new_module, Any,
(Ptr{Uint8},), "usr/lib/julia2/Compat.ji")
C = ccall(:jl_restore_new_module, Any,
(Ptr{Uint8},), "usr/lib/julia2/Color.ji")
C2 = ccall(:jl_restore_new_module, Any,
(Ptr{Uint8},), "usr/lib/julia2/Cairo.ji")
nothing
end
elapsed time: 0.614841921 seconds (11179028 bytes allocated)
julia> @time FP = ccall(:jl_restore_new_module, Any,
(Ptr{Uint8},), "usr/lib/julia2/Gtk.ji")
elapsed time: 1.424939038 seconds (25944208 bytes allocated)
julia> Gtk.GLib.__init__()
julia> Gtk.__init__()
julia> evalfile(Pkg.dir("Gtk/test/runtests.jl")); echo("SUCCESS")
SUCCESS
julia> (edit: for comparison, the timings for loading these modules was 8.5s and 13.5s, respectively) |
If more of us reading this had sharp teeth and claws, you could lose an arm by dangling such treats in front of us. How soon can we get this? Tomorrow? |
From the timing this could be a nice christmas present for the Julia community :-) |
@vtjnash, loading modules in 10% of the original time is great. I don't know the details (and I assume that the devil is in the details), but I'm imagining that all the complexity and magic is in the compilation and serialisation code; and that the module restore basically just does: mmap(module.so); call module_init_fn(). Is all the startup time in the module init function? I'm really interested to know what the 1400ms GTK load time is comprised of. 1400ms is about 3 billion instruction cycles. It seems to me that there's got to be something badly inefficient somewhere in the stack for it to take 3 billion instructions to load a GUI toolkit language binding. |
Jeffs call_overload branch hurt the timing of this branch. For Gtk it caused a massive regression in the operation of the serializer. It was a known issue and will be fixed before this is merged, but it wasnt originally a problem. When this is closer to being complete, I'll start to work on profiling and optimization. For now, I'm focused on just getting it working. |
I admire the thoughtful analysis about what should be possible, @samoconnor. I don't think that is done often enough. That said, I'll take a "mere" 10x improvement without any complaints! |
This is not the kind of thing that will run at a CPU's peak instruction rate. @vtjnash Any clues about why it caused a regression? The dump.c serializer, or serialize.jl? I seem to recall Gtk uses both. |
Potentially relevant to the latter: Pyston has a new approach where (IIUC) they still do lowering to LLVM but then hash the IR and check that against a cache before emitting machine code. |
The Pyston approach is pretty clever. It seems like it works at a rather different level than package compilation, but could help the case where you do end up having to recompile things. |
A very minor nit to be sure, but I wonder if |
That has bitten me at least 20 times in the last two days. |
Especially since the documentation says |
Ultimately we are expecting this to be more automated, yes? i.e. if you make a |
That's understandable, but I think it would be better to be consistent with the other related functions. |
Agree, regarding consistency. Arguably all of these functions could take strings or symbols, but currently they take strings, so let's stick with that. |
these functions used to take filepaths. they don't anymore. (documentation is fixed now) |
Should we open a new issue for brainstorming on how to implement the automatic recompilation edit: moved to #12259 |
Yes, please, @tkelman. |
put back `cd` that was removed in #8745
put back `cd` that was removed in #8745
Hi all, I just wanted to extend my thanks for this work. Here's another datapoint in case anyone cares:
|
17x improvement ain't bad. |
~~So, silly question: I know that modifying the module source will require cache invalidation, but what about building a new julia master? Better safe than sorry and just wipe out ~/.julia/lib ?~~ The answer, of course, is yes, since Base gets rebuilt. |
this is more fun than than ice cream in summer .. excellent elegant effort |
Changes Unknown when pulling e2d842a on jn/static_compile_4 into ** on master**. |
(continuation of #8656 – as I make progress towards incremental module compilation, I'll push the code to this branch for comments and API discussion)
since Julia already has a strong tradition that file != module, I think the primary implementation of this needs to embrace that and use a
*.jlc
file independent of the filesystem. this unifies the behaviorusing
and everything inBase
across restarts. Therefore, a session could look something like the following:And invoking the cache file engine would look something like the following:
or via stdin:
any logic to handle dependencies differently would be handled externally, e.g. https://github.com/malmaud/Autoreload.jl