-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proc_macro/bridge: cache static spans in proc_macro's client thread-local state #98187
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as outdated.
This comment was marked as outdated.
r? @eddyb |
5438033
to
f9a87c2
Compare
f9a87c2
to
6c67f25
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
That sounds nice, but it ends up with Despite looking "global", they're effectively inputs to the proc macro invocation (except But frankly a lot of these concerns would probably go away once we have thread isolation as a bare minimum, and we can start relying on |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 6c67f256af135fa768c9a22e9b462de4d79aaf1c with merge ffc980468071ac4d6aec0110094de58df965751e... |
☀️ Try build successful - checks-actions |
Queued ffc980468071ac4d6aec0110094de58df965751e with parent 2cec687, future comparison URL. |
Finished benchmarking commit (ffc980468071ac4d6aec0110094de58df965751e): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
library/proc_macro/src/bridge/mod.rs
Outdated
/// Context provided alongside the initial inputs for a macro expansion. | ||
/// Provides values such as spans which are used frequently to avoid RPC. | ||
#[derive(Clone)] | ||
struct ExpnContext<S> { | ||
def_site: S, | ||
call_site: S, | ||
mixed_site: S, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this a bunch and something like this might be good:
struct ExpandRequest<Span, I> {
def_site: Span,
call_site: Span,
mixed_site: Span,
input: I,
}
and then ExpandRequest<S::Span, S::TokenStream>
and ExpandRequest<S::Span, (S::TokenStream, S::TokenStream)>
would be used for the two kinds of proc macros.
And if that type is actually used in the API, the Context
trait wouldn't be needed.
I'm not super happy with the names, but this kind of "request" type would fit with some other ideas around interacting with a proc macro crate as a whole with S->C "requests" (at that point the client/server terminology sadly starts getting in the way, but I hope it's clear what I'm referring to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been planning to use the Context
trait more in part 4 (#98189). It's being used for handling the serialization/deserialization of Symbol
types, as well as to provide a free function for validating Ident
s without requiring the proc-macro
crate to depend on unicode_xid
and unicode-normalization
(relevant diff). I could introduce a similar trait in part 4 for that purpose though, and move the spans off of it in this part.
One downside of moving this directly into the API is that it will move more things out of the shared proc macro server and into the 3 seperate proc macro wrapper implementations (1, 2, 3), as we'd need to do the dance to get the spans there, or continue to compute them within the server type, then pull them out to pass alongside, though we can probably get away with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this a bit async, and agreed to move the methods to the Server
trait instead of the Context
trait. We'll revisit changing the shape of the Client
API to make context passing as a parameter more explicit in the future, but it'll be a bit tricky to do in this patch.
This greatly improves the performance of the very frequently called `call_site()` macro when running in a cross-thread configuration.
6c67f25
to
55f052d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if anyone is confused by the changes to the PR, @mystor and I have discussed some bits of it out-of-band)
@bors r+ |
📌 Commit e32ee19 has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (3b0d481): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
…idge This is done by having the crossbeam dependency inserted into the proc_macro server code from the server side, to avoid adding a dependency to proc_macro. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing CrossThread strategies, so they have been removed to keep things simple.
proc_macro: use crossbeam channels for the proc_macro cross-thread bridge This is done by having the crossbeam dependency inserted into the `proc_macro` server code from the server side, to avoid adding a dependency to `proc_macro`. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing `CrossThread` strategies, so they have been removed to keep things simple. r? `@eddyb`
This is the second part of #86822, split off as requested in #86822 (review). This patch removes the RPC calls required for the very common operations of
Span::call_site()
,Span::def_site()
andSpan::mixed_site()
.Some notes:
This part is one of the ones I don't love as a final solution from a design standpoint, because I don't like how the spans are serialized immediately at macro invocation. I think a more elegant solution might've been to reserve special IDs for
call_site
,def_site
, andmixed_site
at compile time (either starting at 1 or fromu32::MAX
) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization.This would also have an advantage for potential future work to allow
proc_macro
to operate more independently from the compiler (e.g. to reduce the necessity ofproc-macro2
), as methods likeSpan::call_site()
could be made to function without access to the compiler backend.That was unfortunately tricky to do at the time, as this was the first part I wrote of the patches. After the later part (#98188, #98189), the other uses of
InternedStore
are removed meaning that a custom serialization strategy forSpan
is easier to implement.If we want to go that path, we'll still need the majority of the work to split the bridge object and introduce the
Context
trait for free methods, and it will be easier to do afterSpan
is the only user ofInternedStore
(after #98189).