Inline SyntaxContext in both encoded span representation. #98840

cjgillot · 2022-07-03T10:58:47Z

The current interned representation for spans does not use the ctxt_or_zero: u16 field. This PR proposes to use this field to store the SyntaxContext of the interned span instead. When ctxt_or_zero and the interned span's ctxt don't match, the inlined one takes precedence.

This allows to implement Span::ctxt and Span::with_ctxt with much less probability to access the interner. Those functions are used a lot for hygiene, so this may be worth it.

cjgillot · 2022-07-03T10:59:02Z

@bors try @rust-timer queue

rust-timer · 2022-07-03T10:59:03Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-03T10:59:10Z

⌛ Trying commit 849d460578d5fd1c1161e9da8f07b6cb5f45b571 with merge f14fc23e45f90f90d8af18adab0747a52ec58aaa...

bors · 2022-07-03T11:17:55Z

💔 Test failed - checks-actions

cjgillot · 2022-07-03T15:13:56Z

@bors try @rust-timer queue

rust-timer · 2022-07-03T15:13:58Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-03T15:14:04Z

⌛ Trying commit ded911fc98d555f8c701e650dce721aa3c4edb48 with merge 6bd01fd732fde597d8ea22f2010c2e51a4613133...

bors · 2022-07-03T17:30:41Z

☀️ Try build successful - checks-actions
Build commit: 6bd01fd732fde597d8ea22f2010c2e51a4613133 (6bd01fd732fde597d8ea22f2010c2e51a4613133)

rust-timer · 2022-07-03T17:30:42Z

Queued 6bd01fd732fde597d8ea22f2010c2e51a4613133 with parent b04bfb4, future comparison URL.

rust-timer · 2022-07-03T19:37:05Z

Finished benchmarking commit (6bd01fd732fde597d8ea22f2010c2e51a4613133): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.5%	0.5%	1
Improvements 🎉 (primary)	-0.4%	-1.3%	69
Improvements 🎉 (secondary)	-0.6%	-2.4%	41
All 😿🎉 (primary)	-0.4%	-1.3%	69

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: 😿 relevant regression found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.8%	3.8%	1
Improvements 🎉 (primary)	-1.1%	-1.1%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-1.1%	-1.1%	1

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-3.6%	-4.0%	2
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

cjgillot · 2022-07-03T21:56:23Z

r? @nnethercote

nnethercote · 2022-07-04T04:23:35Z

compiler/rustc_span/src/span_encoding.rs

+        if ctxt2 < MAX_CTXT {
+            // Inline format or interned format with inline ctxt.
+            self.ctxt_or_max = ctxt2 as u16;
+            return self;


Just self should suffice, no need for a return.

nnethercote · 2022-07-04T04:38:56Z

compiler/rustc_span/src/span_encoding.rs

        } else {
            // Interned format.
            let index =
                with_span_interner(|interner| interner.intern(&SpanData { lo, hi, ctxt, parent }));
-            Span { base_or_index: index, len_or_tag: LEN_TAG, ctxt_or_zero: 0 }
+            let ctxt_or_max = if ctxt2 < MAX_CTXT { ctxt2 } else { MAX_CTXT } as u16;


MAX_CTXT is now a misleading name -- it's actually one past the max context, because it's a special out-of-band value. Perhaps rename it INTERNED_CTXT? And likewise rename ctxt_or_max as ctxt_or_interned?

nnethercote · 2022-07-04T04:42:07Z

compiler/rustc_span/src/span_encoding.rs

 ///
 /// Interned format:
 /// - `span.base_or_index == index` (indexes into the interner table)
 /// - `span.len_or_tag == LEN_TAG` (high bit set, all other bits are zero)
-/// - `span.ctxt == 0`
+/// - `span.ctxt == span_data.ctxt` (must be < `MAX_CTXT`) or `MAX_CTXT` otherwise


This is now a bit unclear. Please expand this comment more by explaining that there are now effectively three cases: inline, interned-but-with-inline-ctxt, and interned.

nnethercote

You said "When ctxt_or_zero and the interned span's ctxt don't match, the inlined one takes precedence." Does this happen? If so, can it be avoided?

The perf results here are very good. Surprisingly so, I thought very little spans were being interned nowadays, mostly for the rare ones where the length exceeds 32 KiB. Do you know which of the fields is mostly often causing the interning?

cjgillot · 2022-07-04T07:35:24Z

The second commit breaks the PartialEq and Hash impls for Span. I'll remove it from this PR.

cjgillot · 2022-07-04T07:41:10Z

@bors try @rust-timer queue

rust-timer · 2022-07-04T07:41:12Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-04T07:41:18Z

⌛ Trying commit 92c9b5a02be525f6bc741f2d5e20305bd8319ea5 with merge a623c2ebc5aef6e9c9fce6aa7ef64118aefd23d1...

cjgillot · 2022-07-04T07:48:55Z

You said "When ctxt_or_zero and the interned span's ctxt don't match, the inlined one takes precedence." Does this happen? If so, can it be avoided?

This was added on purpose to implement with_ctxt. I removed it because it's buggy.
Now, this should not happen.

The perf results here are very good. Surprisingly so, I thought very little spans were being interned nowadays, mostly for the rare ones where the length exceeds 32 KiB. Do you know which of the fields is mostly often causing the interning?

I tried a while back to lengthen the len_or_tag field: #84290. This did not lead to measurable results. Since base_or_index is u32, the only compressed field is ctxt.

Before any conjecture, I'll wait on this new perf run on a not-obviously-buggy implementation.

bors · 2022-07-04T09:08:37Z

☀️ Try build successful - checks-actions
Build commit: a623c2ebc5aef6e9c9fce6aa7ef64118aefd23d1 (a623c2ebc5aef6e9c9fce6aa7ef64118aefd23d1)

rust-timer · 2022-07-04T09:08:41Z

Queued a623c2ebc5aef6e9c9fce6aa7ef64118aefd23d1 with parent a5c6a48, future comparison URL.

rust-timer · 2022-07-04T10:30:50Z

Finished benchmarking commit (a623c2ebc5aef6e9c9fce6aa7ef64118aefd23d1): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-0.4%	-1.1%	16
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	2.5%	2.5%	1
Regressions 😿 (secondary)	2.6%	2.8%	4
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	2.5%	2.5%	1

Cycles

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: no relevant changes found

	mean¹	max	count²
Regressions 😿 (primary)	2.3%	2.3%	1
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	2.3%	2.3%	1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

nnethercote · 2022-07-05T00:03:24Z

The new perf results are a smaller improvement than the old ones.

I'm still puzzled by the goodness of the original perf results. Two of the benchmarks that improved the most are keccak and cranelift-codegen-0.82.1. I did some ad hoc profiling, here is the distribution of calls to Span::new for cranelift-codegen, i.e. measuring creation of spans:

2448625 counts
(  1)  2447415 (100.0%,100.0%): i
(  2)       52 ( 0.0%,100.0%): I 1030124 1142673 0 None
(  3)       13 ( 0.0%,100.0%): I 18039207 18226193 0 None
(  4)        6 ( 0.0%,100.0%): I 2728213 2827861 0 None
(  5)        5 ( 0.0%,100.0%): I 2614338 2728211 0 None
(  6)        4 ( 0.0%,100.0%): I 4819422 4859743 0 None
(  7)        3 ( 0.0%,100.0%): I 1034177 1142673 0 None
(  8)        3 ( 0.0%,100.0%): I 1034480 1142641 0 None
(  9)        3 ( 0.0%,100.0%): I 1189728 1323468 0 None
( 10)        3 ( 0.0%,100.0%): I 1189864 1323468 0 None
( 11)        3 ( 0.0%,100.0%): I 1190422 1323454 0 None
( 12)        3 ( 0.0%,100.0%): I 1514964 1777150 0 None
( 13)        3 ( 0.0%,100.0%): I 1639111 1730662 0 None
( 14)        2 ( 0.0%,100.0%): I 1032858 1142673 0 None
( 15)        2 ( 0.0%,100.0%): I 1032984 1142673 0 None
...

i means "inline" and I means "interned". I.e. there are very few (1210) interned spans, and they're all interned due to len exceeding 15 bits.

Here is the distribution for Span::data_untracked, i.e. measuring use of spans:

7498928 counts
(  1)  7496902 (100.0%,100.0%): j
(  2)      879 ( 0.0%,100.0%): J 18337367 18412187 0 None
(  3)       51 ( 0.0%,100.0%): J 1030124 1142673 0 None
(  4)       39 ( 0.0%,100.0%): J 1189683 1323468 0 None
(  5)       36 ( 0.0%,100.0%): J 1190216 1323468 0 None
(  6)       25 ( 0.0%,100.0%): J 1515023 1777150 0 None
(  7)       23 ( 0.0%,100.0%): J 1503281 1777169 0 None
(  8)       23 ( 0.0%,100.0%): J 1639182 1730662 0 None
(  9)       21 ( 0.0%,100.0%): J 1032978 1142673 0 None
( 10)       19 ( 0.0%,100.0%): J 1514964 1777150 0 None
( 11)       19 ( 0.0%,100.0%): J 1515033 1777150 0 None
( 12)       19 ( 0.0%,100.0%): J 1639111 1730662 0 None
( 13)       16 ( 0.0%,100.0%): J 1034480 1142641 0 None
( 14)       16 ( 0.0%,100.0%): J 1190422 1323454 0 None
( 15)       12 ( 0.0%,100.0%): J 18039207 18226193 0 None
...

j means "inline" and J means "interned". Again, the number of accesses to interned spans (2026) is tiny.

For keccak, it's even more extreme. Span::new:

558110 counts
(  1)   558109 (100.0%,100.0%): i
(  2)        1 ( 0.0%,100.0%): I 547987 647635 0 None

And Span::data_untracked:

2589252 counts
(  1)  2589252 (100.0%,100.0%): j

I.e. there is a single non-interned span created and it is never accessed after creation.

I know that keccak is sensitive to minor changes in code generation for hot code relating to obligation predicates, so the benefits we saw there must be due to that somehow, rather than the span accesses themselves.

apiraino · 2022-08-04T11:19:49Z

Hmm this PR is both waiting for review and on the author :) I'll switch back to the author since the perf. run received comments.

@rustbot ready

cjgillot · 2022-08-10T17:49:01Z

@nnethercote the commit on which the first perf was run was buggy: we could get the wrong ctxt in some cases.

One complement to the explanation could be the effect on hygiene: Ident comparison only checks the ctxt, which would get a stronger effect than just span comparison.

nnethercote

Sorry for the slow review response here. This is looking good, just a couple of comment fixes as mentioned above.

compiler/rustc_span/src/span_encoding.rs

wesleywiser · 2022-09-22T14:41:36Z

@nnethercote's review feedback has been addressed so I'm going to

@bors r+

bors · 2022-09-22T14:41:38Z

📌 Commit 3e67bde has been approved by wesleywiser

It is now in the queue for this repository.

bors · 2022-09-22T17:59:31Z

⌛ Testing commit 3e67bde with merge e7119a0...

bors · 2022-09-22T20:40:45Z

☀️ Test successful - checks-actions
Approved by: wesleywiser
Pushing e7119a0 to master...

rust-timer · 2022-09-22T21:58:48Z

Finished benchmarking commit (e7119a0): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.1%	[0.6%, 1.6%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.7%	[-0.7%, -0.7%]	1
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	2.0%	[0.6%, 3.5%]	5
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.6%	[-3.3%, -2.0%]	2
All ❌✅ (primary)	2.0%	[0.6%, 3.5%]	5

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	2.9%	[2.9%, 2.9%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-2.2%, -2.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[-2.2%, 2.9%]	2

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

nnethercote · 2022-09-26T11:16:00Z

Hmm, the post-merge results indicate this was a slight pessimisation :(

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 3, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 3, 2022

This comment has been minimized.

Sign in to view

bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 3, 2022

cjgillot force-pushed the span-inline-ctxt branch from 849d460 to ded911f Compare July 3, 2022 12:30

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 3, 2022

rust-highfive assigned nnethercote Jul 3, 2022

nnethercote reviewed Jul 4, 2022

View reviewed changes

nnethercote requested changes Jul 4, 2022

View reviewed changes

cjgillot force-pushed the span-inline-ctxt branch from ded911f to 92c9b5a Compare July 4, 2022 07:40

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 4, 2022

apiraino removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 4, 2022

cjgillot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 10, 2022

nnethercote approved these changes Aug 16, 2022

View reviewed changes

compiler/rustc_span/src/span_encoding.rs Outdated Show resolved Hide resolved

compiler/rustc_span/src/span_encoding.rs Outdated Show resolved Hide resolved

compiler/rustc_span/src/span_encoding.rs Outdated Show resolved Hide resolved

cjgillot added 2 commits August 17, 2022 19:18

Keep ctxt in encoded span representation.

b2a17e9

Make names more explicit.

3e67bde

cjgillot force-pushed the span-inline-ctxt branch from 92c9b5a to 3e67bde Compare August 17, 2022 17:21

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 22, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 22, 2022

bors merged commit e7119a0 into rust-lang:master Sep 22, 2022

rustbot added this to the 1.66.0 milestone Sep 22, 2022

bors mentioned this pull request Sep 22, 2022

emit diagnostic suggestion for error when if let used with enum varia… #102062

Closed

cjgillot deleted the span-inline-ctxt branch September 23, 2022 17:29

Inline SyntaxContext in both encoded span representation. #98840

Inline SyntaxContext in both encoded span representation. #98840

Uh oh!

Conversation

cjgillot commented Jul 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjgillot commented Jul 3, 2022

Uh oh!

rust-timer commented Jul 3, 2022

Uh oh!

bors commented Jul 3, 2022

Uh oh!

This comment has been minimized.

This comment has been minimized.

bors commented Jul 3, 2022

Uh oh!

cjgillot commented Jul 3, 2022

Uh oh!

rust-timer commented Jul 3, 2022

Uh oh!

bors commented Jul 3, 2022

Uh oh!

bors commented Jul 3, 2022

Uh oh!

rust-timer commented Jul 3, 2022

Uh oh!

rust-timer commented Jul 3, 2022

Footnotes

Uh oh!

cjgillot commented Jul 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnethercote left a comment

Choose a reason for hiding this comment

Uh oh!

cjgillot commented Jul 4, 2022

Uh oh!

cjgillot commented Jul 4, 2022

Uh oh!

rust-timer commented Jul 4, 2022

Uh oh!

bors commented Jul 4, 2022

Uh oh!

cjgillot commented Jul 4, 2022

Uh oh!

bors commented Jul 4, 2022

Uh oh!

rust-timer commented Jul 4, 2022

Uh oh!

rust-timer commented Jul 4, 2022

Footnotes

Uh oh!

nnethercote commented Jul 5, 2022

Uh oh!

apiraino commented Aug 4, 2022

Uh oh!

cjgillot commented Aug 10, 2022

Uh oh!

nnethercote left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wesleywiser commented Sep 22, 2022

Uh oh!

bors commented Sep 22, 2022

Uh oh!

bors commented Sep 22, 2022

Uh oh!

bors commented Sep 22, 2022

Uh oh!

rust-timer commented Sep 22, 2022

Overall result: ❌✅ regressions and improvements - no action needed

cjgillot commented Jul 3, 2022 •

edited

Loading