-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache LLVM between try runs #112011
Comments
I have been planning to do this for some time. I actually started implementing it this week, but I realized that I'd need some Python dependencies for this in the PGO script, which was a bit annoying, because if is a rather "locked down" script using Python 3.6 with everything in a single file (it's being executed from different Linux/Windows environment, and while definitely possible, it wouldn't be completely trivial to add dependencies to it). After discussing this with @Mark-Simulacrum, we have decided that it would be better to move the PGO script to Rust (not to bootstrap, but to I think that we don't actually need to cache LLVM itself. We can just cache the PGO and BOLT profiles (these are two files, ~50 MiB each), as these should be much easier to cache, and mainly much easier to "inject" into the builds. With these files available, we don't need to run LLVM PGO and LLVM BOLT instrumentation steps, we can just run My plan is to basically hash |
This seems broadly good and is how our other caching works, but I do want to mention that this has caused issues in the past where if the cache gets busted CI will timeout (e.g. #49278). I don't have solutions to that in mind, but we should be wary that we don't get close to GitHub's time limit. |
I don't think that this would be a problem for try builds. Currently, we do 3 builds of LLVM in each try build, and it takes ~1h 30m (after my PR which disables some unneeded targets). With caching, it should be strictly faster, there is nothing that would be performed "extra" vs the current status in the plan that I have described (apart from hashing a few directories and uploading two files to S3, but that should be very fast). |
This makes me somewhat uneasy in terms of how it will affect benchmark stability at the points where LLVM does get updated. The profiles depend on the IR processed by LLVM, which in turn depends on the IR produced by rustc, which depends on lots of things like libstd code, MIR opts, etc. I would rather not have every commit that touches LLVM have spurious perf diffs because that's the only point where we gather a new profile. FWIW, the current profile bootstrap sequence still has a lot of room for improvement. We currently rebuild rustc three times, but should only build it twice. Two of those builds are also done with PGO/BOLT instrumented LLVM, which makes the rustc build much slower. This should shave off at least 30 minutes from try builds. The ideal sequence would look something like:
|
Btw @jyn514 where did you get that 111/131 figure? From the log that you have posted, it seems to me that LLVM takes ~35 minutes:
|
To be clear, the problem I'm worried about is:
I think try builds are limited enough in scope that this isn't a giant risk in practice, just something to keep in mind.
ah, that's my bad - I missed the PGO rustc build in the middle and was counting it as part of LLVM. I think it takes more than 35 minutes though - the time taken to build rustc for instrumenting PGO/BOLT also counts. |
Yeah, this also worries me. My plan was to basically "try it and see what happens". I'm not sure how important are the changes to IR patterns being generated, but it's possible that we would see large perf swings, which would be bad. Regarding the schedule that you have described, I agree that this would be the ideal scenario (we could even pre-cache the initial LLVM build without any PGO instrumentation/use in S3, that shouldn't cause any problems with profile noise). However, I'm not sure how to currently use bootstrap to achieve this sequence - e.g. is it (easily) possible to take an existing rustc and "relink" it to an updated version of LLVM? This direction is definitely worth exploring IMO, as it could possibly be more stable and even easier than the caching approach. |
i'm not sure what you mean by relink here - LLVM is a shared object file, so rustc will use whichever .so is in rpath/LD_LIBRARY_PATH at runtime, rustc itself doesn't need be rebuilt. In general you can set a custom |
Yes, caching the initial build should be fine -- though maybe it would already be mostly free due to sccache? (Which doesn't help us for PGO builds, but should help for plain ones.)
In theory, "relinking" rustc should just be a matter of copying the new libLLVM.so in the right place. I'd expect the main tricky part here would be allowing bootstrap to build just libLLVM.so without doing anything else. |
With
Yeah, that was what I was alluding to. Steps 1. - 3. are easy. At step 4, we need to (first delete and then) rebuild LLVM, but normally in bootstrap this also causes further rebuilds of rustc. So we basically need bootstrap to allow us to rebuild/modify |
Can we include |
Yeah, as I wrote in my first comment here, I'd hash (at least) src/llvm-project, src/ci and src/bootstrap. |
So, it looks like it is possible to just run (For BOLT it's more complicated, but we can deal with that later, I want to get it working for PGO first.) |
Avoid one `rustc` rebuild in the optimized build pipeline This PR changes the optimized build pipeline to avoid one `rustc` rebuild, inspired by [this comment](rust-lang#112011 (comment)). This speeds up the pipeline by 5-10 minutes. After this change, we **no longer gather LLVM PGO profiles from compiling stage 2 of `rustc`**. Now we build `rustc` two times (1x PGO instrumented, 1x PGO optimized) and LLVM three times (1x normal, 1x PGO instrumented, 1x PGO optimized). It should be possible to cache the normal LLVM build, but I'll leave that for another PR.
We currently spend an inordinate amount of time rebuilding LLVM on every try run. See for example https://github.com/rust-lang-ci/rust/actions/runs/5073748000/jobs/9113150676#step:25:38905, where building LLVM with PGO and BOLT takes up 111 out of 131 total minutes of the run.
We should cache LLVM between runs so perf runs don't take so long. It should be fine to cache LLVM until the next time the
src/llvm-project
is updated, the same waydownload-ci-llvm
works today. To avoid having three different LLVM artifacts stored in S3 per commit, we'd replace the current artifacts with the BOLT artifacts, leaving the-alt
builds as the unoptimized + debug-assertions build.@rust-lang/wg-llvm @Kobzol @rust-lang/infra
The text was updated successfully, but these errors were encountered: