-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CI runs faster #214
Comments
I'm not familiar with your setup for caching #208 should merge very soon, and might make it easier to not have to cache the Since the baseline version isn't changing all that often (especially not with every PR), it should be possible to cache the baseline rustdoc JSON file across runs. This is how the GitHub Action builds that file, and in principle it could be improved to cache the file as well. After running In general, I believe your |
My workflow doesn't build it for the baseline version, no. I thought that one was downloaded from docs.rs or something? Does it build the baseline version also locally? I am happy to collaborate on an/the action. Fitting that one out with some pre-configured caching would be really nice actually :) #208 is nice to see! |
docs.rs doesn't yet host the JSON, so for now the baseline is built locally (based on the registry, a gitrev, or a filesystem path) unless the baseline JSON is explicitly provided via the This is what the action does at the moment: https://github.com/obi1kenobi/cargo-semver-checks-action/blob/main/action.yml#L29 In a v2 of the action, we'd want to remove the version-selection logic from the action (since it incorrectly selects pre-releases, unlike the logic already present in Perhaps something like a What do you think? Also, looping in @epage in case he has ideas. |
I think this is the MVP. As long as it is somewhere in the |
I just saw that there is already |
Just separating them from the current compilation output would already be good. That would allow us to clean that target directory of artifacts and cache it. These should never need to change so unless someone messes with the |
I'm a bit disk-space-limited on the machine I have with me right now, so if you don't mind, I'd love to ask you to check something in that
The cache would be compressed with zstd in GitHub Actions, so it would probably be reduced to around one-third or one-fourth the raw JSON files' size.
We can put the cache somewhere like |
That is a 110MB! |
Sounds good to me. I guess to make this really effective, |
Whew, I'm glad it's that little :) It's also JSON so it should be highly compressible with zstd which GitHub Actions will use by default.
I think cargo-semver-checks should move the JSON files into the cache directory, but I think we should leave the cleaning to its GitHub Action I don't think that cargo-semver-checks itself should clean the target directory, since that might be undesirable for local use. Let me know if you have any concerns! |
I'll test it locally!
Hmm, I think that is inconsistent. We are creating those JSON files locally because we can't download from docs.rs. The fact that we need to build them is an implementation detail IMO. Using as little disk space as possible is also desirable locally, especially when we are talking as much as 22GB for a single repository! Alternatively, we can also say that the program is completely ignorant of where the data is stored and the action moves around the JSON files and passes them in as baseline etc |
Default |
I am happy to have a go at implementing this if you agree that it should happen within |
That'd be great! I think it's reasonable to handle caching here, and it's also fine to delete the baseline build data from the target dir if caching the output. I'd also be open to adding zstd compression for the cached baseline JSON file if you think that's worth it independently, though perhaps that should be a separate addition just to keep things simple. How can I help? I'm happy to review draft PRs or whatever else you need. |
I am leaning towards that being separate. Compared to other build output, the JSON files are small meaning there isn't much to be gained. The only environment that would benefit from that is local builds and I am not sure how frequent local use is compared to CI runs? |
I'll open a draft PR once I have some questions and/or an initial design, thanks :) |
Having seen the internals now, I have a question: Why are we building the docs for the current revision in a separate directory? Why not just reuse the default location? That would allow users to automatically benefit from caching. We already cache the debug build for our crates, having |
It'd be something to try. Unsure how well it handles any overhead from the user doing different feature sets than us. |
Here's a specific example scenario to illustrate the concern that @epage is pointing toward. In principle, In that case, I believe that will both force a full rebuild and wipe the default location's cached data, which would probably make the user quite unhappy. Not saying we couldn't work around this if needed. Just saying that applying some caution is justified :) |
Right, that makes sense. I'll see if I can optimise that on my end by just adding another cache! |
I spent today working on a new optimizations API in Trustfall, and while it still needs a bit of polish, I wanted to give you a sneak peek because it's relevant to the objective of making CI faster 😄 The numbers below are the result of implementing just one optimization using the new API. I've intentionally picked some gigantic crates where the perf impact is most obvious — smaller crates will benefit proportionally less, but still noticeably.
Needless to say, I'm excited 🤩 |
Damn, that is amazing! :) Looking at our current CI, I just had another idea. At the moment, we are creating and parsing two rustdocs, only to then say 0 checks necessary because we are already doing a minor bump which allows breaking changes. We could reverse that and avoid parsing the rustdoc altogether? |
The version numbers are read from the rustdoc files, which are the ultimate source of truth since it's what Rust and cargo saw and generated. So I don't think that would work, and while I'm sure it could be "wedged in," it doesn't sound like it would be worth the trouble. |
I added one more optimization just to play with the new API, and I'm pretty happy with both the ergonomics and the outcomes: on the biggest crate I could find, the most expensive lint took ~1.3s instead of 4887s before (~3759x speedup). Here's a full perf report. I used the Before (cargo-semver-checks v0.17): 18178s lint wall-clock time (i.e. over 5h)
Added "item name -> item id" index: 12.7s lint wall-clock time
Previous + "(item id, method name) -> (impl id, method id)" index: 9.1s lint wall-clock time
I need to clean up the code and add quite a few tests before this is ready for prime-time. But this already seems very encouraging, and I am sure that there's even more performance that we could squeeze out here if we chose to. |
I did a slightly better job with the index construction, using 8.0s lint wall-clock time, 2,272x faster overall
The slowest lint in the suite is now taking just a hair over one second to run, and is a nice round 4500x faster than before. When we start running the lints in parallel with rayon, on most hardware it will take approximately the same time to parse (not generate! just parse!) the rustdoc JSON with serde as to run all the lints. I'm pretty happy with that. |
More details on how all of the above works here: https://predr.ag/blog/speeding-up-rust-semver-checking-by-over-2000x/ Now just a small matter of finishing the implementation (and testing!) of the new API so we can hit these numbers outside of a test setup. |
Describe your use case
We are using this in CI and it contributes about ~1-2min to the CI run of each crate despite:
cargo semver-checks
being installedcargo doc
being cachedSee https://github.com/thomaseizinger/rust-libp2p/actions/runs/3653859145/jobs/6173748557 for an example.
Is there a way for how this can be improved? When I run it locally, it is a lot faster (after the first invocation) so I am guessing there is a cache somewhere? Which files do I need to cache?
Describe the solution you'd like
cargo semver-checks
to finish within seconds, not minutes.Alternatives, if applicable
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: