Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST: Give spans to all identifiers #49154

Merged
merged 10 commits into from
Apr 6, 2018
Merged

Conversation

petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented Mar 19, 2018

Change representation of ast::Ident from { name: Symbol, ctxt: SyntaxContext } to { name: Symbol, span: Span }.
Syntax contexts still can be extracted from spans (span.ctxt()).

Why this should not require more memory:

  • Span is u32 just like SyntaxContext.
  • Despite keeping more spans in AST we don't actually create more spans, so the number of "outlined" spans kept in span interner shouldn't become larger.

Why this may be slightly slower:

  • When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise.

cc #48842 (comment)

@rust-highfive
Copy link
Collaborator

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 19, 2018
@petrochenkov
Copy link
Contributor Author

petrochenkov commented Mar 19, 2018

cc @jseyfried @nrc @rust-lang/compiler
r? @eddyb

@rust-highfive rust-highfive assigned eddyb and unassigned pnkfelix Mar 19, 2018
@nrc
Copy link
Member

nrc commented Mar 19, 2018

👍

@eddyb
Copy link
Member

eddyb commented Mar 19, 2018

This looks great! cc @michaelwoerister on potential interactions with incremental hashing.
r=me after Travis CI build is fixed and mw had a look at it.

@michaelwoerister
Copy link
Member

Epic PR :) I think this should be fine for incr. comp., we already hash syntax contexts and spans.

@Mark-Simulacrum
Copy link
Member

Somewhat odd failure on Travis:

[00:50:03] ---- [run-pass] run-pass/rfc-2126-extern-absolute-paths/test.rs stdout ----
[00:50:03] 	
[00:50:03] error: compilation failed!
[00:50:03] status: exit code: 101
[00:50:03] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/run-pass/rfc-2126-extern-absolute-paths/test.rs" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass" "--target=x86_64-unknown-linux-gnu" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/rfc-2126-extern-absolute-paths/test.stage2-x86_64-unknown-linux-gnu" "-Crpath" "-O" "-Zmiri" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--test" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/rfc-2126-extern-absolute-paths/test.stage2-x86_64-unknown-linux-gnu.aux"
[00:50:03] stdout:
[00:50:03] ------------------------------------------
[00:50:03] 
[00:50:03] ------------------------------------------
[00:50:03] stderr:
[00:50:03] ------------------------------------------
[00:50:03] error[E0658]: `crate` in paths is experimental (see issue #45477)
[00:50:03]   --> /checkout/src/test/run-pass/rfc-2126-extern-absolute-paths/test.rs:20:1
[00:50:03]    |
[00:50:03] 20 | / fn test() {
[00:50:03] 21 | | }
[00:50:03]    | |_^
[00:50:03]    |
[00:50:03]    = help: add #![feature(crate_in_paths)] to the crate attributes to enable

@estebank
Copy link
Contributor

/subscribe: follow up after landing by removing as many codemap().def_span() calls throughout the codebase as possible. (I don't think it will be possible to remove all, as some are spans for lifetimes, but the ones for struct/trait/impls errors should use the ident spans.)

@petrochenkov
Copy link
Contributor Author

@estebank

some are spans for lifetimes

Lifetimes are also represented with Ident so they get spans as well.
Also, this is only the first half of "giving spans to all identifiers" - I haven't converted HIR to use Ident yet, this is going to be a separate large PR.

@@ -653,7 +653,7 @@ impl<'a> HashStable<StableHashingContext<'a>> for ast::Ident {
hasher: &mut StableHasher<W>) {
let ast::Ident {
ref name,
ctxt: _ // Ignore this
span: _ // Ignore this
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelwoerister
Now I'm pretty sure this is a mistake (and ignoring ctxt was a mistake as well?).
Spans are hashed in other positions, including the ctxt part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be updated! Good catch.

impl Hash for Ident {
fn hash<H: Hasher>(&self, state: &mut H) {
self.name.hash(state);
self.span.ctxt().hash(state);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these Hash and PartialEq I've kept the old behavior - compare and hash idents as (Name, SyntaxContext).

PartialEq for Ident should either

  • not exist (I'd prefer this, but larger structures containing idents want to derive PartialEq and Hash for whatever reasons), or
  • behave like PartialEq for (Name, SyntaxContext). PartialEq not behaving like this is a notable footgun and source of bugs, we had it couple years ago. Well, and Hash should has to be implemented consistently with PartialEq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to derive PartialEq and Hash for whatever reasons

What happens if you just remove those? Is PartialEq it used only for tests? Could those tests serialize to JSON, use pattern matching, or do something else instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try.
IIRC, at least token::Token contains Ident and uses == all the time for "simple" variants not actually containing Idents or other data (the case that would be perfectly supported by is, btw).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a bit of a bummer - and yeah, I agree about is.

Copy link
Member

@eddyb eddyb Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, should Token contain Ident, or just Symbol (and keep the Span separate)? Or is that too much hassle / not readily supported by existing infrastructure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lots of spans get moved from the regular HIR nodes to Ident and then we don't hash them, that would not be good. But the HashStable implementation for Ident doesn't ignore the span, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, StableHash doesn't ignore the span (well, after fixing #49154 (comment)).

Also,

regular HIR nodes

this PR doesn't touch HIR yet, only AST.

@nikomatsakis
Copy link
Contributor

@petrochenkov very cool

@bors
Copy link
Contributor

bors commented Mar 23, 2018

☔ The latest upstream changes (presumably #49308) made this pull request unmergeable. Please resolve the merge conflicts.

@petrochenkov petrochenkov changed the title [WIP] AST: Give spans to all identifiers AST: Give spans to all identifiers Mar 25, 2018
@petrochenkov
Copy link
Contributor Author

Updated (there are couple of new commits).

@bors
Copy link
Contributor

bors commented Mar 26, 2018

☔ The latest upstream changes (presumably #49101) made this pull request unmergeable. Please resolve the merge conflicts.

@shepmaster
Copy link
Member

Ping from triage, @eddyb ! Will you have time to review this soon?

FYI, @petrochenkov — you've got some merge conflicts.

@eddyb
Copy link
Member

eddyb commented Mar 30, 2018

@shepmaster I already reviewed in #49154 (comment). I see, #49154 (comment)

} = *self;

name.hash_stable(hcx, hasher);
span.hash_stable(hcx, hasher);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @michaelwoerister (making sure you see this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mw previously confirmed the span should be hashed here - #49154 (comment)

@@ -1774,11 +1774,14 @@ impl<'a> Visitor<'a> for PostExpansionVisitor<'a> {

fn visit_path(&mut self, path: &'a ast::Path, _id: NodeId) {
for segment in &path.segments {
// Context of ident spans cannot be overriden to ignore unstable features,
// so replace it with path span context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's happening here. Do you have an example where the behavior changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fix for the regression caught by Travis and mentioned in #49154 (comment).

The problem is that here we want two different SyntaxContexts for identifiers generated by #[test] - one context for name resolution (unhygienic) and another one for stability checking (hygienic, stability is not checked), but Ident has only one SyntaxContext.

So we can keep the stability checking context in something that is not used for name resolution, for example whole Path (or anything else that is not an identifier/lifetime).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Can the source comment be expanded to say more of what is happening from its own perspective? e.g. starting with "Here we check the span from the whole Path instead of that of individual Idents specifically because only the former can be ...".

@petrochenkov
Copy link
Contributor Author

Updated.

bors added a commit that referenced this pull request Apr 6, 2018
AST: Give spans to all identifiers

Change representation of `ast::Ident` from `{ name: Symbol, ctxt: SyntaxContext }` to `{ name: Symbol, span: Span }`.
Syntax contexts still can be extracted from spans (`span.ctxt()`).

Why this should not require more memory:
- `Span` is `u32` just like `SyntaxContext`.
- Despite keeping more spans in AST we don't actually *create* more spans, so the number of "outlined" spans kept in span interner shouldn't become larger.

Why this may be slightly slower:
- When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise.

cc #48842 (comment)
@bors
Copy link
Contributor

bors commented Apr 6, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: eddyb
Pushing a143462 to master...

@oli-obk
Copy link
Contributor

oli-obk commented Apr 7, 2018

Is it possible that this PR regressed expansion info on derives?

The code generated by derive(Debug) in the following

#[derive(Debug)]
pub enum Error {
    Type(
        &'static str,
    ),
}

contains a ref __self_0 pattern whose span is the unchanged span of &'static str above.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Apr 7, 2018

@oli-obk
Sure, this is a relatively large refactoring, I tried to be careful, but something could break accidentally.

In this specific example though, it looks like __self_0 always had span of &'static str:

const __self_0: u8 = 0;

#[derive(Debug)]
pub enum Error {
    Type(
        &'static str,
    ),
}

fn main() {}

---- On stable

error[E0530]: match bindings cannot shadow constants
 --> src/main.rs:6:9
  |
1 | const __self_0: u8 = 0;
  | ----------------------- a constant `__self_0` is defined here
...
6 |         &'static str,
  |         ^^^^^^^^^^^^^ cannot be named the same as a constant

Or you are talking about SyntaxContext? Could you give a reproduction then?

EDIT: I see, it's about of whole ref __self_0 pattern span. Reproduction would still be appreciated then.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 7, 2018

@petrochenkov the current reproduction is a clippy lint reporting things that happened inside the expansion, but reporting it at the span of the type.

The relevant lint is https://github.com/rust-lang-nursery/rust-clippy/blob/master/clippy_lints/src/needless_borrow.rs#L86

We do check for macro expansions here, which uses the expn_info to figure out whether any expansion has happened (implemented here)

@petrochenkov
Copy link
Contributor Author

@oli-obk
Before I look into that Clippy code, you may be interested in the last commit - 1458684.
Spans of identifiers themselves cannot be used for detecting macro expansions anymore, you have to use spans from some larger context. This may affect Clippy as well.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 7, 2018

We took the span from a Pat, not an Ident, but that Pat might've gotten its span from an identifier. I'll have a look at the expansion code of Debug derives.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Apr 19, 2018
Discovered in rust-lang#50061 we're falling off the "happy path" of using a stringified
token stream more often than we should. This was due to the fact that a
user-written token like `0xf` is equality-different from the stringified token
of `15` (despite being semantically equivalent).

This patch updates the call to `eq_unspanned` with an even more awful solution,
`probably_equal_for_proc_macro`, which ignores the value of each token and
basically only compares the structure of the token stream, assuming that the AST
doesn't change just one token at a time.

While this is a step towards fixing rust-lang#50061 there is still one regression
from rust-lang#49154 which needs to be fixed.
bors added a commit that referenced this pull request Apr 20, 2018
proc_macro: Stay on the "use the cache" path more

Discovered in #50061 we're falling off the "happy path" of using a stringified
token stream more often than we should. This was due to the fact that a
user-written token like `0xf` is equality-different from the stringified token
of `15` (despite being semantically equivalent).

This patch updates the call to `eq_unspanned` with an even more awful solution,
`probably_equal_for_proc_macro`, which ignores the value of each token and
basically only compares the structure of the token stream, assuming that the AST
doesn't change just one token at a time.

While this is a step towards fixing #50061 there is still one regression
from #49154 which needs to be fixed.
@petrochenkov petrochenkov deleted the spident branch June 5, 2019 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.