-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] use graal native-image to create native images of jvm tools for performance increases in certain workflows #6893
Conversation
2e553c4
to
6a6bb18
Compare
3eaaf4c
to
64922d3
Compare
71ec854
to
3323c8a
Compare
3323c8a
to
6b0d54b
Compare
6b0d54b
to
8ffb598
Compare
785f90c
to
7e5c1c6
Compare
671fa8e
to
767e2d7
Compare
1f6e6d4
to
334ac4b
Compare
c5831bc
to
e96e5f3
Compare
This should be ready for review after removing many different parts from it. There might be a smaller surface to cut at, but I really wanted to have the end-to-end testing via scalafmt there along with this large new feature. I've made a project at https://github.com/pantsbuild/pants/projects/13 covering the larger arc of native-image work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super exciting! I have a couple of questions about the core abstractions here...
When we were adding the hermetic
execution strategy, I started feeling kind of sketchy about it because in reality, each subclass needs to write specific code to handle the hermetic case. It looks like it's the same for native-image as a strategy, and that in a bunch of places we have fake implementations; if you specify --execution-strategy=native-image
for a bunch of Tasks where you can, it will either do subprocess
, or hermetic
, or raise. Accordingly, I don't actually think --execution-strategy
is a good flag to have on a common super-class. It feels like it should only be exposed by things that support it (which can use a common library class to implement it), rather than exposed by any JVM tool (where some things just don't implement some of the strategies)...
Though if we do keep these things in NailgunTaskBase
, we should probably rename that to something like JvmToolTaskBase
, because it's very not nailgun-specific any more.
Similarly, I'm not convinced that the JVM Executor
and Runner
abstractions are right to load this into; right now, for native image we simply ignore the arguments passed to the API methods; that feels like we're trying to force an implementation into an existing (but incorrect) abstraction; either we should be able to properly use the arguments to the methods, or we should not try to use the same API.
def generate_urls(self, version, host_platform): | ||
system_id = self._SYSTEM_ID[host_platform.os_name] | ||
archive_basename = self._ARCHIVE_BASE_FMT.format(version=version, system_id=system_id) | ||
return [self._DIST_URL_FMT.format(version=version, base=archive_basename)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with networks which can't access the public internet? Will this automatically fall back to some mirror, or is that code that needs writing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the same resolution mechanism as other binary tools, in that if --no-allow-external-binary-tool-downloads
is set, it will attempt to fetch it from --binaries-baseurls
. The url generators are a convenience for open source users which allows someone to set e.g. --graal-version=1.0.0rc16
when that version is released in the future, without requiring any work on our part to allow that. We haven't yet upstreamed any script which syncs our pantsbuild s3 to another source, so people in other monorepos without public internet access will have to write that code for themselves.
] + list(native_image_config.extra_args) | ||
|
||
pprinted_argv = safe_shlex_join(argv) | ||
with temporary_dir() as tmp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to specify the actual output dir as the root_dir
here because it guarantees that the move you do later will be on the same device (so quick); people who have tempdirs on different partitions from their cachedir would otherwise need to incur a second disk write to do the move
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now modified native-image execution to occur hermetically, so we avoid this entirely, but noting this for the future!
|
||
class NativeImageCreationError(Exception): pass | ||
|
||
def produce_native_image(self, tool_classpath, main_class, native_image_config, context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it could do with a unit test
text_type(root)) | ||
|
||
|
||
def digest_file_paths(paths, scheduler, root=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably have a unit test, as it's now a publicly exposed function
@illicitonion I agree with your high-level feedback and will remove |
Marking this as WIP again until the above comments are addressed. |
e96e5f3
to
24727b2
Compare
./pants --no-process-execution-cleanup-local-dirs fmt.scalafmt --execution-strategy=native-image --files-per-process=2 --no-skip src/scala:: Also, remove the parallel execution stuff and make scalafmt work for real: - had to delay initialization of a few classes until runtime for the scalafmt cli to stop making paths relative to the native-image build dir - made a `.native_image_config` property to specify these delayed initializations - make native-image creation output to a workunit - add docstrings
24727b2
to
5d20d23
Compare
Closing due to being stale (pretty big merge conflict) - although GraalVM is awesome and it would be neat to see this be revived. |
WIP, as it depends on:
Problem
Graal is a virtual machine with a very large variety of frontends, including JVM- and LLVM-based languages, as well as interpreted languages such as Python, JavaScript, and R. One of the many things it can do is create "native images" which are able to begin executing immediately, without any VM startup time (although the performance is limited to the capabilities of AOT compilation). In some experimentation, I found I was able to match the speed of a warm nailgunned scalafmt task on a large target set in our monorepo with an extremely naive parallelism strategy (invoking one native image process per target). Not having to maintain a warm nailgun (and the associated memory) has potential benefits on developer laptops, as well as CI machines, depending on how leaky the tool is, and the runtime environment. In an ephemeral environment of any sort (apparently kubernetes is an example of this), zero startup time may make it more scalable to lazily download JVM tools instead of maintaining state in any specific ephemeral node.
Solution
GraalCE
binary tool, which can transform a classpath into an AOT-compiled image with zero startup time.NativeImageExecutor
for executing JVM code which creates a native image on the fly, then executes it hermetically.NailgunTaskBase
implementation so that it can be selected using--execution-strategy=native-image
, e.g. inpants.ini
.Followup
product_request()
API to perform parallel hermetic execution of all native-image subprocesses (ideally with an API to do this fromNailgunTask
).GraalNativeImageCreate
to create native images out of anyjvm_bundles
, probably in a separate PR.jaotc
.Result
Scalafmt can be invoked as a native-image, and the path is clear for other tools to do the same!