Lexer improvements #99884

nnethercote · 2022-07-29T02:56:00Z

Some cleanups and small speed improvements.

r? @matklad

rustbot · 2022-07-29T02:56:04Z

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

nnethercote · 2022-07-29T02:58:05Z

Here are some instruction count results I got locally, for some rustc-perf benchmarks and some other benchmarks.

Primary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor?
libc-0.2.124	check	full	-0.38%	1.91x
stm32f4-0.14.0	check	full	-0.34%	1.70x
unicode-normalization-0.1.19	check	full	-0.31%	1.57x

Secondary and external benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor?
web-sys-0.3.56	check	full	-2.91%	14.57x
externs	check	full	-1.16%	5.80x
unicode_categories-0.1.1	check	full	-0.75%	3.75x
pest-2.1.3	check	full	-0.70%	3.50x
unused-warnings	check	full	-0.56%	2.82x
deep-vector	check	full	-0.54%	2.71x
coercions	check	full	-0.54%	2.69x
tuple-stress	check	full	-0.47%	2.35x
unicode-xid-0.2.3	check	full	-0.41%	2.05x
structopt-0.3.26-2	check	full	-0.38%	1.90x
ucd	check	full	-0.38%	1.89x
log-0.4.16	check	full	-0.35%	1.76x

@bors try @rust-timer queue

rust-timer · 2022-07-29T02:58:06Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-29T02:58:13Z

⌛ Trying commit 37ec8903bd0fffe545458aaa819bd1749ca8f4bb with merge 12221390288e64d968f906cf54b356a5e3d010fe...

bors · 2022-07-29T07:13:13Z

💥 Test timed out

src/librustdoc/html/highlight.rs

matklad

This basically looks good to me, love the usize -> u32 change.

I am slightly less sure about boxing errors -- I understand that this shrinks the size_of::<Token>, but this makes the type !Copy, so call-sites would suffer from some code-bloat due to drop-glue. The ultimate decision here should probably be dictated by the benchmark, but I wish we could have find some more elegant solution here.

Perhaps, rather than returning an error outright, we can just bool signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?

nnethercote · 2022-07-29T11:13:13Z

Perhaps, rather than returning an error outright, we can just bool signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?

Definitely possible, there is already some partial re-lexing when the tokens are "cooked" in rustc_parser, which is also when the errors are reported. It's possible to get size of Token down to one byte if you push hard enough, though that's probably excessive. I did a test run earlier with a size of 8 bytes by just disabling the string errors and it shaved off another 0.5% in the best case. I'll do something in this direction on Monday.

nnethercote · 2022-07-29T23:24:07Z

Ok, I have done the re-lexing thing, getting the size of Token down to 12 bytes.

@bors try @rust-timer queue

rust-timer · 2022-07-29T23:24:09Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-29T23:24:19Z

⌛ Trying commit c9907b53dd331ee60d14aab76b22ee85fe4f39f9 with merge 6ea3a126c47fe2d95f163775720572ac11dc8846...

bors · 2022-07-30T01:19:52Z

☀️ Try build successful - checks-actions
Build commit: 6ea3a126c47fe2d95f163775720572ac11dc8846 (6ea3a126c47fe2d95f163775720572ac11dc8846)

rust-timer · 2022-07-30T01:19:53Z

Queued 6ea3a126c47fe2d95f163775720572ac11dc8846 with parent 3924dac, future comparison URL.

rust-timer · 2022-07-30T03:33:21Z

Finished benchmarking commit (6ea3a126c47fe2d95f163775720572ac11dc8846): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.3%	-0.7%	48
Improvements 🎉 (secondary)	-0.5%	-1.2%	49
All 😿🎉 (primary)	-0.3%	-0.7%	48

Max RSS (memory usage)

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regression found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.1%	2.1%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	2.2%	2.2%	1
Regressions 😿 (secondary)	3.3%	4.5%	6
Improvements 🎉 (primary)	-3.8%	-3.8%	1
Improvements 🎉 (secondary)	-2.6%	-2.9%	2
All 😿🎉 (primary)	-0.8%	-3.8%	2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

compiler/rustc_lexer/src/lib.rs

Avoid doing stuff until it's necessary.

It not needed, always being set to the end of the text.

Because it's tiny and hot.

Because it's small and hot.

From 72 bytes to 12 bytes (on x86-64). There are two parts to this: - Changing various source code offsets from 64-bit to 32-bit. This is not a problem because the rest of rustc also uses 32-bit source code offsets. This means `Token` is no longer `Copy` but this causes no problems. - Removing the `RawStrError` from `LiteralKind`. Raw string literal invalidity is now indicated by a `None` value within `RawStr`/`RawByteStr`, and the new `validate_raw_str` function can be used to re-lex an invalid raw string literal to get the `RawStrError`. There is one very small change in behaviour. Previously, if a raw string literal matched both the `InvalidStarter` and `TooManyHashes` cases, the latter would override the former. This has now changed, because `raw_double_quoted_string` now uses `?` and so returns immediately upon detecting the `InvalidStarter` case. I think this is a slight improvement to report the earlier-detected error, and it explains the change in the `test_too_many_hashes` test. The commit also removes a couple of comments that refer to rust-lang#77629 and say that the size of these types don't affect performance. These comments are wrong, though the performance effect is small.

nnethercote · 2022-07-31T22:55:01Z

I have rebased to fix the conflicts.

matklad

Yes, with validate_raw_str this looks very much nicer, thanks!

@bors r+

matklad · 2022-08-01T09:27:13Z

compiler/rustc_parse/src/lexer/mod.rs

+// This assertion is in this crate, rather than in `rustc_lexer`, because that
+// crate cannot depend on `rustc_data_structures`.
+#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
+rustc_data_structures::static_assert_size!(rustc_lexer::Token, 12);


An alternative would be to open-code this in the lexer crate, it's not too annoying these days:

#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))] const _: () = if std::mem::size_of::<Token>() != 12 { panic!() } else { () };

const _: () = assert!(std::mem::size_of::<Token> == 12);

Can't use assert_eq!() unfortunately.

matklad · 2022-08-01T09:31:30Z

(I am also not sure if I have bors these days)

lqd · 2022-08-01T09:51:07Z

I don't think it works in GH review comments though

@bors r=matklad

bors · 2022-08-01T09:51:13Z

📌 Commit 99f5c79 has been approved by matklad

It is now in the queue for this repository.

bors · 2022-08-01T12:52:52Z

⌛ Testing commit 99f5c79 with merge dcb444a...

bors · 2022-08-01T15:40:37Z

☀️ Test successful - checks-actions
Approved by: matklad
Pushing dcb444a to master...

rust-timer · 2022-08-01T16:57:28Z

Finished benchmarking commit (dcb444a): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.3%	-0.6%	42
Improvements 🎉 (secondary)	-0.5%	-1.2%	48
All 😿🎉 (primary)	-0.3%	-0.6%	42

Max RSS (memory usage)

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	5.4%	5.4%	1
Regressions 😿 (secondary)	4.8%	8.1%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	5.4%	5.4%	1

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-2.8%	-3.3%	3
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

…klad Lexer improvements Some cleanups and small speed improvements. r? `@matklad`

… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`

rust-highfive assigned matklad Jul 29, 2022

rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jul 29, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 29, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 29, 2022

flip1995 reviewed Jul 29, 2022

View reviewed changes

src/librustdoc/html/highlight.rs Outdated Show resolved Hide resolved

nnethercote force-pushed the lexer-improvements branch from 37ec890 to b6fb134 Compare July 29, 2022 08:52

This comment has been minimized.

Sign in to view

matklad approved these changes Jul 29, 2022

View reviewed changes

nnethercote force-pushed the lexer-improvements branch from b6fb134 to c9907b5 Compare July 29, 2022 23:23

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 30, 2022

fbstj reviewed Jul 30, 2022

View reviewed changes

compiler/rustc_lexer/src/lib.rs Outdated Show resolved Hide resolved

nnethercote added 4 commits August 1, 2022 08:11

Improve shebang handling.

5518599

Avoid doing stuff until it's necessary.

Remove StringReader::end_src_index.

bd23d68

It not needed, always being set to the end of the text.

Inline first_token.

b4fdf64

Because it's tiny and hot.

Avoid an unnecessary return.

c01a36d

nnethercote added 3 commits August 1, 2022 08:27

Inline TokenStreamBuilder::push.

ddf62b5

Because it's small and hot.

Add a size assertion for Token.

e6b9fcc

nnethercote force-pushed the lexer-improvements branch from c9907b5 to 99f5c79 Compare July 31, 2022 22:54

matklad approved these changes Aug 1, 2022

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 1, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 1, 2022

bors merged commit dcb444a into rust-lang:master Aug 1, 2022

rustbot added this to the 1.64.0 milestone Aug 1, 2022

nnethercote deleted the lexer-improvements branch August 9, 2022 03:44

flip1995 pushed a commit to flip1995/rust that referenced this pull request Aug 11, 2022

Auto merge of rust-lang#99884 - nnethercote:lexer-improvements, r=mat…

80a5687

…klad Lexer improvements Some cleanups and small speed improvements. r? `@matklad`

nnethercote mentioned this pull request Sep 27, 2022

More lexer improvements #102302

Merged

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 28, 2022

Auto merge of rust-lang#102302 - nnethercote:more-lexer-improvements,…

6201eab

… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`

Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023

Auto merge of rust-lang#102302 - nnethercote:more-lexer-improvements,…

ebc5fd4

… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexer improvements #99884

Lexer improvements #99884

nnethercote commented Jul 29, 2022

rustbot commented Jul 29, 2022

nnethercote commented Jul 29, 2022

rust-timer commented Jul 29, 2022

bors commented Jul 29, 2022

bors commented Jul 29, 2022

This comment has been minimized.

matklad left a comment

nnethercote commented Jul 29, 2022

nnethercote commented Jul 29, 2022

rust-timer commented Jul 29, 2022

bors commented Jul 29, 2022

bors commented Jul 30, 2022

rust-timer commented Jul 30, 2022

rust-timer commented Jul 30, 2022

nnethercote commented Jul 31, 2022

matklad left a comment

matklad Aug 1, 2022

ChayimFriedman2 Aug 8, 2022

matklad commented Aug 1, 2022

lqd commented Aug 1, 2022

bors commented Aug 1, 2022

bors commented Aug 1, 2022

bors commented Aug 1, 2022

rust-timer commented Aug 1, 2022

Lexer improvements #99884

Lexer improvements #99884

Conversation

nnethercote commented Jul 29, 2022

rustbot commented Jul 29, 2022

nnethercote commented Jul 29, 2022

Primary benchmarks

Secondary and external benchmarks

rust-timer commented Jul 29, 2022

bors commented Jul 29, 2022

bors commented Jul 29, 2022

This comment has been minimized.

matklad left a comment

Choose a reason for hiding this comment

nnethercote commented Jul 29, 2022

nnethercote commented Jul 29, 2022

rust-timer commented Jul 29, 2022

bors commented Jul 29, 2022

bors commented Jul 30, 2022

rust-timer commented Jul 30, 2022

rust-timer commented Jul 30, 2022

Footnotes

nnethercote commented Jul 31, 2022

matklad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matklad commented Aug 1, 2022

lqd commented Aug 1, 2022

bors commented Aug 1, 2022

bors commented Aug 1, 2022

bors commented Aug 1, 2022

rust-timer commented Aug 1, 2022

Footnotes