Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer improvements #99884

Merged
merged 7 commits into from
Aug 1, 2022
Merged

Lexer improvements #99884

merged 7 commits into from
Aug 1, 2022

Conversation

nnethercote
Copy link
Contributor

Some cleanups and small speed improvements.

r? @matklad

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jul 29, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 29, 2022
@rustbot
Copy link
Collaborator

rustbot commented Jul 29, 2022

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@nnethercote
Copy link
Contributor Author

Here are some instruction count results I got locally, for some rustc-perf benchmarks and some other benchmarks.

Primary benchmarks

Benchmark Profile Scenario % Change Significance Factor?
libc-0.2.124 check full -0.38% 1.91x
stm32f4-0.14.0 check full -0.34% 1.70x
unicode-normalization-0.1.19 check full -0.31% 1.57x

Secondary and external benchmarks

Benchmark Profile Scenario % Change Significance Factor?
web-sys-0.3.56 check full -2.91% 14.57x
externs check full -1.16% 5.80x
unicode_categories-0.1.1 check full -0.75% 3.75x
pest-2.1.3 check full -0.70% 3.50x
unused-warnings check full -0.56% 2.82x
deep-vector check full -0.54% 2.71x
coercions check full -0.54% 2.69x
tuple-stress check full -0.47% 2.35x
unicode-xid-0.2.3 check full -0.41% 2.05x
structopt-0.3.26-2 check full -0.38% 1.90x
ucd check full -0.38% 1.89x
log-0.4.16 check full -0.35% 1.76x

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 29, 2022
@bors
Copy link
Contributor

bors commented Jul 29, 2022

⌛ Trying commit 37ec8903bd0fffe545458aaa819bd1749ca8f4bb with merge 12221390288e64d968f906cf54b356a5e3d010fe...

@bors
Copy link
Contributor

bors commented Jul 29, 2022

💥 Test timed out

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks good to me, love the usize -> u32 change.

I am slightly less sure about boxing errors -- I understand that this shrinks the size_of::<Token>, but this makes the type !Copy, so call-sites would suffer from some code-bloat due to drop-glue. The ultimate decision here should probably be dictated by the benchmark, but I wish we could have find some more elegant solution here.

Perhaps, rather than returning an error outright, we can just bool signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?

@nnethercote
Copy link
Contributor Author

Perhaps, rather than returning an error outright, we can just bool signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?

Definitely possible, there is already some partial re-lexing when the tokens are "cooked" in rustc_parser, which is also when the errors are reported. It's possible to get size of Token down to one byte if you push hard enough, though that's probably excessive. I did a test run earlier with a size of 8 bytes by just disabling the string errors and it shaved off another 0.5% in the best case. I'll do something in this direction on Monday.

@nnethercote
Copy link
Contributor Author

Ok, I have done the re-lexing thing, getting the size of Token down to 12 bytes.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Jul 29, 2022

⌛ Trying commit c9907b53dd331ee60d14aab76b22ee85fe4f39f9 with merge 6ea3a126c47fe2d95f163775720572ac11dc8846...

@bors
Copy link
Contributor

bors commented Jul 30, 2022

☀️ Try build successful - checks-actions
Build commit: 6ea3a126c47fe2d95f163775720572ac11dc8846 (6ea3a126c47fe2d95f163775720572ac11dc8846)

@rust-timer
Copy link
Collaborator

Queued 6ea3a126c47fe2d95f163775720572ac11dc8846 with parent 3924dac, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6ea3a126c47fe2d95f163775720572ac11dc8846): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.3% -0.7% 48
Improvements 🎉
(secondary)
-0.5% -1.2% 49
All 😿🎉 (primary) -0.3% -0.7% 48

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.1% 2.1% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
2.2% 2.2% 1
Regressions 😿
(secondary)
3.3% 4.5% 6
Improvements 🎉
(primary)
-3.8% -3.8% 1
Improvements 🎉
(secondary)
-2.6% -2.9% 2
All 😿🎉 (primary) -0.8% -3.8% 2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 30, 2022
Avoid doing stuff until it's necessary.
It not needed, always being set to the end of the text.
Because it's tiny and hot.
Because it's small and hot.
From 72 bytes to 12 bytes (on x86-64).

There are two parts to this:
- Changing various source code offsets from 64-bit to 32-bit. This is
  not a problem because the rest of rustc also uses 32-bit source code
  offsets. This means `Token` is no longer `Copy` but this causes no
  problems.
- Removing the `RawStrError` from `LiteralKind`. Raw string literal
  invalidity is now indicated by a `None` value within
  `RawStr`/`RawByteStr`, and the new `validate_raw_str` function can be
  used to re-lex an invalid raw string literal to get the `RawStrError`.

There is one very small change in behaviour. Previously, if a raw string
literal matched both the `InvalidStarter` and `TooManyHashes` cases,
the latter would override the former. This has now changed, because
`raw_double_quoted_string` now uses `?` and so returns immediately upon
detecting the `InvalidStarter` case. I think this is a slight
improvement to report the earlier-detected error, and it explains the
change in the `test_too_many_hashes` test.

The commit also removes a couple of comments that refer to rust-lang#77629 and
say that the size of these types don't affect performance. These
comments are wrong, though the performance effect is small.
@nnethercote
Copy link
Contributor Author

I have rebased to fix the conflicts.

Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with validate_raw_str this looks very much nicer, thanks!

@bors r+

// This assertion is in this crate, rather than in `rustc_lexer`, because that
// crate cannot depend on `rustc_data_structures`.
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
rustc_data_structures::static_assert_size!(rustc_lexer::Token, 12);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to open-code this in the lexer crate, it's not too annoying these days:

#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
const _: () = if std::mem::size_of::<Token>() != 12 { panic!() } else { () };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const _: () = assert!(std::mem::size_of::<Token> == 12);

Can't use assert_eq!() unfortunately.

@matklad
Copy link
Member

matklad commented Aug 1, 2022

(I am also not sure if I have bors these days)

@lqd
Copy link
Member

lqd commented Aug 1, 2022

I don't think it works in GH review comments though

@bors r=matklad

@bors
Copy link
Contributor

bors commented Aug 1, 2022

📌 Commit 99f5c79 has been approved by matklad

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 1, 2022
@bors
Copy link
Contributor

bors commented Aug 1, 2022

⌛ Testing commit 99f5c79 with merge dcb444a...

@bors
Copy link
Contributor

bors commented Aug 1, 2022

☀️ Test successful - checks-actions
Approved by: matklad
Pushing dcb444a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 1, 2022
@bors bors merged commit dcb444a into rust-lang:master Aug 1, 2022
@rustbot rustbot added this to the 1.64.0 milestone Aug 1, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (dcb444a): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.3% -0.6% 42
Improvements 🎉
(secondary)
-0.5% -1.2% 48
All 😿🎉 (primary) -0.3% -0.6% 42

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
5.4% 5.4% 1
Regressions 😿
(secondary)
4.8% 8.1% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 5.4% 5.4% 1

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.8% -3.3% 3
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@nnethercote nnethercote deleted the lexer-improvements branch August 9, 2022 03:44
flip1995 pushed a commit to flip1995/rust that referenced this pull request Aug 11, 2022
…klad

Lexer improvements

Some cleanups and small speed improvements.

r? `@matklad`
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 28, 2022
… r=matklad

More lexer improvements

A follow-up to rust-lang#99884.

r? `@matklad`
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
… r=matklad

More lexer improvements

A follow-up to rust-lang#99884.

r? `@matklad`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.