-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexer improvements #99884
Lexer improvements #99884
Conversation
Some changes occurred in src/tools/clippy cc @rust-lang/clippy |
Here are some instruction count results I got locally, for some rustc-perf benchmarks and some other benchmarks. Primary benchmarks
Secondary and external benchmarks
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 37ec8903bd0fffe545458aaa819bd1749ca8f4bb with merge 12221390288e64d968f906cf54b356a5e3d010fe... |
💥 Test timed out |
37ec890
to
b6fb134
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically looks good to me, love the usize -> u32
change.
I am slightly less sure about boxing errors -- I understand that this shrinks the size_of::<Token>
, but this makes the type !Copy
, so call-sites would suffer from some code-bloat due to drop-glue. The ultimate decision here should probably be dictated by the benchmark, but I wish we could have find some more elegant solution here.
Perhaps, rather than returning an error outright, we can just bool
signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?
Definitely possible, there is already some partial re-lexing when the tokens are "cooked" in |
b6fb134
to
c9907b5
Compare
Ok, I have done the re-lexing thing, getting the size of @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit c9907b53dd331ee60d14aab76b22ee85fe4f39f9 with merge 6ea3a126c47fe2d95f163775720572ac11dc8846... |
☀️ Try build successful - checks-actions |
Queued 6ea3a126c47fe2d95f163775720572ac11dc8846 with parent 3924dac, future comparison URL. |
Finished benchmarking commit (6ea3a126c47fe2d95f163775720572ac11dc8846): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
Avoid doing stuff until it's necessary.
It not needed, always being set to the end of the text.
Because it's tiny and hot.
Because it's small and hot.
From 72 bytes to 12 bytes (on x86-64). There are two parts to this: - Changing various source code offsets from 64-bit to 32-bit. This is not a problem because the rest of rustc also uses 32-bit source code offsets. This means `Token` is no longer `Copy` but this causes no problems. - Removing the `RawStrError` from `LiteralKind`. Raw string literal invalidity is now indicated by a `None` value within `RawStr`/`RawByteStr`, and the new `validate_raw_str` function can be used to re-lex an invalid raw string literal to get the `RawStrError`. There is one very small change in behaviour. Previously, if a raw string literal matched both the `InvalidStarter` and `TooManyHashes` cases, the latter would override the former. This has now changed, because `raw_double_quoted_string` now uses `?` and so returns immediately upon detecting the `InvalidStarter` case. I think this is a slight improvement to report the earlier-detected error, and it explains the change in the `test_too_many_hashes` test. The commit also removes a couple of comments that refer to rust-lang#77629 and say that the size of these types don't affect performance. These comments are wrong, though the performance effect is small.
c9907b5
to
99f5c79
Compare
I have rebased to fix the conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, with validate_raw_str
this looks very much nicer, thanks!
@bors r+
// This assertion is in this crate, rather than in `rustc_lexer`, because that | ||
// crate cannot depend on `rustc_data_structures`. | ||
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))] | ||
rustc_data_structures::static_assert_size!(rustc_lexer::Token, 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to open-code this in the lexer crate, it's not too annoying these days:
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
const _: () = if std::mem::size_of::<Token>() != 12 { panic!() } else { () };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const _: () = assert!(std::mem::size_of::<Token> == 12);
Can't use assert_eq!()
unfortunately.
(I am also not sure if I have bors these days) |
I don't think it works in GH review comments though @bors r=matklad |
☀️ Test successful - checks-actions |
Finished benchmarking commit (dcb444a): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
…klad Lexer improvements Some cleanups and small speed improvements. r? `@matklad`
… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`
… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`
Some cleanups and small speed improvements.
r? @matklad