Cache declarative macro expansion on disk (for incremental comp.). Based on #128605 #128747

futile · 2024-08-06T17:45:58Z

NOTE: Don't merge yet, mostly here for CI, also rebased on top of #128605!

This PR enables on-disk caching for incremental compilation of declarative macro expansions. The base mechanism is added in #128605, but not enabled for incremental comp. there yet.

r? @petrochenkov since you are in the loop here, but feel free to un-/reassign.

@futile

does not compile after rebase (by @futile) Co-authored-by: Felix Rath <felixm.rath@gmail.com>

@futile

does not compile after rebase (by @futile) Co-authored-by: Felix Rath <felixm.rath@gmail.com>

Co-authored-by: Felix Rath <felixm.rath@gmail.com>

…issue-49301.rs

…e key.

rustbot · 2024-08-06T17:46:06Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

futile · 2024-08-06T17:49:11Z

@rustbot author

Don't need a review for now, I think, since it's already linked in the previous PR.

futile · 2024-08-06T17:50:22Z

Actually, since we are mostly waiting for the base PR, maybe this is better:

@rustbot blocked

petrochenkov · 2024-08-06T21:23:11Z

@bors try @rust-timer queue

Cache declarative macro expansion on disk (for incremental comp.). Based on rust-lang#128605 ## NOTE: Don't merge yet, mostly here for CI, also rebased on top of rust-lang#128605! This PR enables on-disk caching for incremental compilation of declarative macro expansions. The base mechanism is added in rust-lang#128605, but not enabled for incremental comp. there yet. r? `@petrochenkov` since you are in the loop here, but feel free to un-/reassign.

bors · 2024-08-06T21:24:23Z

⌛ Trying commit f2cf758 with merge 33076b4...

bors · 2024-08-06T23:19:07Z

☀️ Try build successful - checks-actions
Build commit: 33076b4 (33076b42c5fbe698a1d1887910d8bc5f2fc93c2a)

rust-timer · 2024-08-07T00:34:27Z

Finished benchmarking commit (33076b4): comparison URL.

Overall result: ❌✅ regressions and improvements - BENCHMARK(S) FAILED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

❗ ❗ ❗ ❗ ❗
Warning ⚠️: The following benchmark(s) failed to build:

hyper-0.14.18
libc-0.2.124

❗ ❗ ❗ ❗ ❗

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.8%	[0.2%, 58.4%]	135
Regressions ❌ (secondary)	16.5%	[0.2%, 87.5%]	49
Improvements ✅ (primary)	-10.2%	[-11.5%, -9.6%]	6
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.2%	[-11.5%, 58.4%]	141

Max RSS (memory usage)

Results (primary 6.6%, secondary 33.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.6%	[1.0%, 34.4%]	105
Regressions ❌ (secondary)	34.5%	[2.0%, 156.1%]	34
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	1
All ❌✅ (primary)	6.6%	[1.0%, 34.4%]	105

Cycles

Results (primary 7.4%, secondary 27.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	7.8%	[0.9%, 55.2%]	55
Regressions ❌ (secondary)	27.7%	[2.1%, 93.1%]	27
Improvements ✅ (primary)	-1.6%	[-1.6%, -1.5%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	7.4%	[-1.6%, 55.2%]	57

Binary size

Results (primary -0.2%, secondary -0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.8%, -0.0%]	30
Improvements ✅ (secondary)	-0.2%	[-0.3%, -0.0%]	9
All ❌✅ (primary)	-0.2%	[-0.8%, -0.0%]	30

Bootstrap: 760.99s -> 765.527s (0.60%)
Artifact size: 336.87 MiB -> 337.33 MiB (0.14%)

futile · 2024-08-07T21:03:35Z

These are my takeaways from the perf run:

Lots of mid-big regressions for many benchmarks, and small-mid regressions on many others.
Only html5ever sees improvements, but there it actually gets a noticeable -10% instructions in most configurations.
hyper and libc both fail to build due to AttrId in the argument TokenStream, which could either not be flattened or not be cached; But TokenStream::flattened() definitely seems to not be enough in practice.

Unconditional Decl Macro Caching Seems to Not Be Worth It

I think the big takeaway is that unconditionally caching all declarative macro expansions for incremental compilation is not worth it. I'd assume the overhead of caching+retrieving is simply much bigger than re-evaluation for many/most of the invocations. However, the big -10% gain for html5ever shows that some crates can benefit from incremental caching. As far as I remember html5ever is very heavy on declarative macro usage (I think it predates proc macros by quite a bit, it was around at/before Rust 1.0 iirc?), and thus might have some macro expansions that take enough time for disk-caching to be worth it.

Needs Cost-Heuristic to Be Useful, but Potential Gains Uncertain

At this point a next step could be to try to figure out some cut-off/condition for when to cache decl macro expansions. I.e., some kind of "complex enough that disk-caching is (probably) useful"-heuristic. However, as we saw from the perf run in #128605, integration with the query system has a non-zero cost, which we would thus need to overcome as well (for almost all crates that invoke decl macros), which would need to be tried out/experimented on. How big the possible gains are is also hard to say, for many crates all decl macro invocations might just be cheap enough that caching is never worth it, thus only leaving the query system integration overhead. This overhead might also be improvable, but I don't really have any idea about that.

One possibility that could make this still useful would be if integration into the query system would enable more parts of the dependency graph to stay "green" (I think?), which currently always need to be recomputed. But given the performance regressions that at least doesn't seem to be the case at the moment (and probably applies for many other operations as well).

Implementation Probably Needs to Wait For #124141 Anyway

The build failures of hyper and libc seem to be due to TokenStream::flattened()/trying to stable-hash the TokenStream, which seems to not work yet. So it probably doesn't make sense to try this before #124141 anyway.

Ok, that said, since my initial motivation was to cache proc macro expansions (see #99515 (comment)), I tackled this mostly because it seemed to be a good first step implementation-wise (and because I wanted to). However, it seems that performance-wise it isn't a good first step 😅 But it has given me a basis for how incremental caching and macro expansion work in the compiler, which should still help when tackling proc macros.

Not sure if I want to immediately start that, also have some other stuff coming up/a bit bummed out by this work not being useful in the end. I didn't know that this might not be a certain good first step (due to potential perf losses), but just assumed it because the other PR had been commented on/discussion going on, so I thought it was the right direction. Turns out it wasn't, guess that can only be known afterwards. Oh well, good experience I guess, and reviewers already have enough to do anyway :)

So thanks a lot for engaging and taking the time to review @petrochenkov! :) It would be ok for me to close this PR, since I don't plan to work on it again soon (would rather tackle proc macro caching), not sure what standard procedure here is. I hope that is okay. I guess it would be nice to have the perf results "findable" somehow, but I'll just leave that up to you.

petrochenkov · 2024-08-07T21:41:05Z

I expect #124141 to be merged in a couple of months, so I'd rather keep this PR open and then retry testing and benchmarking after the #124141's merge to see what the effect is.

…=petrochenkov refactor(rustc_expand::mbe): Don't require full ExtCtxt when not necessary Refactor `mbe::diagnostics::failed_to_match_macro()` to not require a full `ExtCtxt`, but only a `&ParseSess`. It hard-required the `ExtCtxt` only for a call to `cx.trace_macros_diag()`, which we move instead to the only call-site of the function. Note: This could be a potential change in observed behavior, because a call to `cx.trace_macros_diag()` now always happens after `failed_to_match_macro()` was called, where before it was only called at the end of the main return path of the function. But since `trace_macros_diag()` "flushes" out any not-yet-reported errors, it should be ok to call it for all paths, since there shouldn't be any on the non-main paths I think. However, I don't know the rest of the codebase well enough to say that with 100% confidence, but `tests/ui` still pass, which gives at least some confidence in the change. Also concretize the return type from `Box<dyn MacResult>` to `(Span, ErrorGuaranteed)`, because this function will _always_ return an error, and never any other kind of result. Was part of rust-lang#128605 and rust-lang#128747, but is a standalone refactoring. r? ``@petrochenkov``

Rollup merge of rust-lang#128798 - futile:refactor/mbe-diagnostics, r=petrochenkov refactor(rustc_expand::mbe): Don't require full ExtCtxt when not necessary Refactor `mbe::diagnostics::failed_to_match_macro()` to not require a full `ExtCtxt`, but only a `&ParseSess`. It hard-required the `ExtCtxt` only for a call to `cx.trace_macros_diag()`, which we move instead to the only call-site of the function. Note: This could be a potential change in observed behavior, because a call to `cx.trace_macros_diag()` now always happens after `failed_to_match_macro()` was called, where before it was only called at the end of the main return path of the function. But since `trace_macros_diag()` "flushes" out any not-yet-reported errors, it should be ok to call it for all paths, since there shouldn't be any on the non-main paths I think. However, I don't know the rest of the codebase well enough to say that with 100% confidence, but `tests/ui` still pass, which gives at least some confidence in the change. Also concretize the return type from `Box<dyn MacResult>` to `(Span, ErrorGuaranteed)`, because this function will _always_ return an error, and never any other kind of result. Was part of rust-lang#128605 and rust-lang#128747, but is a standalone refactoring. r? ``@petrochenkov``

…ng, r=<try> Experimental: Add Derive Proc-Macro Caching # On-Disk Caching For Derive Proc-Macro Invocations This PR adds on-disk caching for derive proc-macro invocations using rustc's query system to speed up incremental compilation. The implementation is (intentionally) a bit rough/incomplete, as I wanted to see whether this helps with performance before fully implementing it/RFCing etc. I did some ad-hoc performance testing. ## Rough, Preliminary Eval Results: Using a version built through `DEPLOY=1 src/ci/docker/run.sh dist-x86_64-linux` (which I got from [here](https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization)). ### [Some Small Personal Project](https://github.com/futile/ultra-game): ```console # with -Zthreads=0 as well $ touch src/main.rs && cargo +dist check ``` Caused a re-check of 1 crate (the only one). Result: | Configuration | Time (avg. ~5 runs) | |--------|--------| | Uncached | ~0.54s | | Cached | ~0.54s | No visible difference. ### [Bevy](https://github.com/bevyengine/bevy): ```console $ touch crates/bevy_ecs/src/lib.rs && cargo +dist check ``` Caused a re-check of 29 crates. Result: | Configuration | Time (avg. ~5 runs) | |--------|--------| | Uncached | ~6.4s | | Cached | ~5.3s | Roughly 1s, or ~17% speedup. ### [Polkadot-Sdk](https://github.com/paritytech/polkadot-sdk): Basically this script (not mine): https://github.com/coderemotedotdev/rustc-profiles/blob/d61ad38c496459d82e35d8bdb0a154fbb83de903/scripts/benchmark_incremental_builds_polkadot_sdk.sh TL;DR: Two full `cargo check` runs to fill the incremental caches (for cached & uncached). Then 10 repetitions of `touch $some_file && cargo +uncached check && cargo +cached check`. ```console $ cargo update # `time` didn't build because compiler too new/dep too old $ ./benchmark_incremental_builds_polkadot_sdk.sh # see above ``` _Huge_ workspace with ~190 crates. Not sure how many were re-built/re-checkd on each invocation. Result: | Configuration | Time (avg. 10 runs) | |--------|--------| | Uncached | 99.4s | | Cached | 67.5s | Very visible speedup of 31.9s or ~32%. --- **-> Based on these results I think it makes sense to do a rustc-perf run and see what that reports.** --- ## Current Limitations/TODOs I left some `FIXME(pr-time)`s in the code for things I wanted to bring up/draw attention to in this PR. Usually when I wasn't sure if I found a (good) solution or when I knew that there might be a better way to do something; See the diff for these. ### High-Level Overview of What's Missing For "Real" Usage: * [ ] Add caching for `Bang`- and `Attr`-proc macros (currently only `Derive`). * Not a big change, I just focused on `derive`-proc macros for now, since I felt like these should be most cacheable and are used very often in practice. * [ ] Allow marking specific macros as "do not cache" (currently only all-or-nothing). * Extend the unstable option to support, e.g., `-Z cache-derive-macros=some_pm_crate::some_derive_macro_fn` for easy testing using the nightly compiler. * After Testing: Add a `#[proc_macro_cacheable]` annotation to allow proc-macro authors to "opt-in" to caching (or sth. similar). Would probably need an RFC? * Might make sense to try to combine this with rust-lang#99515, so that external dependencies can be picked up and be taken into account as well. --- So, just since you were in the loop on the attempt to cache declarative macro expansions: r? `@petrochenkov` Please feel free to re-/unassign! Finally: I hope this isn't too big a PR, I'll also show up in Zulip since I read that that is usually appreciated. Thanks a lot for taking a look! :) (Kind of related/very similar approach, old declarative macro caching PR: rust-lang#128747)

petrochenkov · 2025-02-19T16:06:48Z

I expect #124141 to be merged in a couple of months, so I'd rather keep this PR open and then retry testing and benchmarking after the #124141's merge to see what the effect is.

#124141 wasn't merged in a couple of months due to my long absence, but now I'm back so there will be progress.
This PR is still blocked on that PR though.

petrochenkov · 2025-04-14T11:49:41Z

#124141 has landed, but this is still blocked on #129102 as I understand.

SparrowLii and others added 7 commits August 5, 2024 22:35

make declare macro a part of query system

b6b27f0

does not compile after rebase (by @futile) Co-authored-by: Felix Rath <felixm.rath@gmail.com>

no hash for the query result

0098721

does not compile after rebase (by @futile) Co-authored-by: Felix Rath <felixm.rath@gmail.com>

cleanup hashes and span in the query key

25e82a1

Co-authored-by: Felix Rath <felixm.rath@gmail.com>

fix: Error reporting for expand_legacy_bang (passes tests/ui)

6134b01

WIP,DNC: Enable caching of legacy bang expansions

2c15b50

WIP,UNTIDY: debugging x test -v tests/incremental/static_stable_hash/…

64e6072

…issue-49301.rs

Fix tests/incremental. Required making the arg TokenStream part of th…

53c4644

…e key.

rustbot assigned petrochenkov Aug 6, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 6, 2024

futile mentioned this pull request Aug 6, 2024

Make declarative macro expansion a part of query system (cont. of #125356) #128605

Closed

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 6, 2024

rustbot added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 6, 2024

This comment has been minimized.

Sign in to view

fix: Remove eprintln! cause it messes up tests/ui

f2cf758

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 6, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 7, 2024

petrochenkov mentioned this pull request Aug 7, 2024

make declarative macro expansion a part of query system #125356

Closed

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Aug 7, 2024

futile mentioned this pull request Aug 7, 2024

refactor(rustc_expand::mbe): Don't require full ExtCtxt when not necessary #128798

Merged

petrochenkov added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 8, 2024

futile mentioned this pull request Aug 14, 2024

Experimental: Add Derive Proc-Macro Caching #129102

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache declarative macro expansion on disk (for incremental comp.). Based on #128605 #128747

Cache declarative macro expansion on disk (for incremental comp.). Based on #128605 #128747

futile commented Aug 6, 2024

Uh oh!

rustbot commented Aug 6, 2024

Uh oh!

futile commented Aug 6, 2024

Uh oh!

futile commented Aug 6, 2024

Uh oh!

This comment has been minimized.

petrochenkov commented Aug 6, 2024

Uh oh!

This comment has been minimized.

bors commented Aug 6, 2024

Uh oh!

bors commented Aug 6, 2024

Uh oh!

This comment has been minimized.

rust-timer commented Aug 7, 2024

Uh oh!

futile commented Aug 7, 2024 •

edited

Loading

Uh oh!

petrochenkov commented Aug 7, 2024

Uh oh!

petrochenkov commented Feb 19, 2025

Uh oh!

petrochenkov commented Apr 14, 2025

Uh oh!

Uh oh!

Cache declarative macro expansion on disk (for incremental comp.). Based on #128605 #128747

Are you sure you want to change the base?

Cache declarative macro expansion on disk (for incremental comp.). Based on #128605 #128747

Conversation

futile commented Aug 6, 2024

NOTE: Don't merge yet, mostly here for CI, also rebased on top of #128605!

Uh oh!

rustbot commented Aug 6, 2024

Uh oh!

futile commented Aug 6, 2024

Uh oh!

futile commented Aug 6, 2024

Uh oh!

This comment has been minimized.

petrochenkov commented Aug 6, 2024

Uh oh!

This comment has been minimized.

bors commented Aug 6, 2024

Uh oh!

bors commented Aug 6, 2024

Uh oh!

This comment has been minimized.

rust-timer commented Aug 7, 2024

Overall result: ❌✅ regressions and improvements - BENCHMARK(S) FAILED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

futile commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

These are my takeaways from the perf run:

Unconditional Decl Macro Caching Seems to Not Be Worth It

Needs Cost-Heuristic to Be Useful, but Potential Gains Uncertain

Implementation Probably Needs to Wait For #124141 Anyway

Uh oh!

petrochenkov commented Aug 7, 2024

Uh oh!

petrochenkov commented Feb 19, 2025

Uh oh!

petrochenkov commented Apr 14, 2025

Uh oh!

Uh oh!

futile commented Aug 7, 2024 •

edited

Loading