Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept tuple.0.0 as tuple indexing #70420

Closed
wants to merge 9 commits into from
Closed

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Mar 26, 2020

Previously rustc would reject tuple.0.0, requiring the programmer to write either tuple.0 .0 or (tuple.0).0 to work around the limitation because the 0.0 is emitted as one float token by the lexer.

This PR adjusts the lexer to produce IdentDotIntegerDotInteger for the above, instead of IdentDotFloat, thereby making tuple.0.0 work without a parser change.



r? @Centril

FYI @nikomatsakis since your ancient comments came up. ❤️

FYI @petrochenkov and @matklad — I would be interested whether you are on board with a lexer change like this.

@dtolnay dtolnay added T-lang Relevant to the language team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 26, 2020
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 26, 2020
@petrochenkov petrochenkov self-assigned this Mar 26, 2020
@@ -783,47 +780,6 @@ impl<'a> Parser<'a> {
self.struct_span_err(self.token.span, &format!("unexpected token: `{}`", actual)).emit();
}

fn recover_field_access_by_float_lit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this method go; it was ungreat to refactor in the past :) 🎉

@Centril
Copy link
Contributor

Centril commented Mar 26, 2020

From the POV of someone who is quite familiar with rustc_parse but not with rustc_lexer (so I'm not an expert in the subject matter of this PR, and so the technical reviewer should be r? @petrochenkov) this change seems quite minimal and reasonable. While it somewhat complicates the spec it doesn't seem like it does so notably, and on the other hand it simplifies the compiler internals in practical terms (diff is negative).

Assuming @petrochenkov & @matklad are happy with this I would be happy to propose FCP merging this to the language team.

Though let's also nominate it for discussion today on our language team meeting.

@matklad
Copy link
Member

matklad commented Mar 26, 2020

Am I correct that tuple. 0.0 won't work? We should add this as a test. And probably tuple./*i love special cases*/0.0 as well.

Am I correct that this change will break this program?

macro_rules! m { (.$l:literal) => (); }
m!(.0.0);

fn main() {}

It should be added as a test as well.

Another case I can think of is (((),),).000.000. I think for this one behavior before and after would be the same, but better test than sorry.

Generally, I see two different ways how we can achieve this behavior:

  • tweak the lexing (like this PR does), using a rule like "it is not an FP literal, if preceded by dot"
  • tweak the parsing (like the original recovery did), using a rule like "a field name can be an ident, and int, or a float with a specific text".

I personally would lean towards the second approach, as it will produce less "false positives". It means that the parser should be exposed to the text of the token, and not only to its lexical class, but it is already somewhat happens for contextual keywords (although this case is different, as the set of context keywords is finite, while the set of \d+.\d+ float literals is not).

Makes me wonder why we didn't choose ((),)._0 as a syntax for tuple fields in the first place :)

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2020

@matklad:

Am I correct that tuple. 0.0 won't work? We should add this as a test. And probably tuple./*i love special cases*/0.0 as well.

Added tests. These are now the way to write IdentDotFloat if that's what the programmer means, but that should be quite rare. It's better than requiring a workaround for IdentDotIntegerDotInteger.


Am I correct that this change will break this program?
macro_rules! m { (.$l:literal) => (); } m!(.0.0);
It should be added as a test as well.

Added test.


Another case I can think of is (((),),).000.000. I think for this one behavior before and after would be the same, but better test than sorry.

Added test.

@bjorn3
Copy link
Member

bjorn3 commented Mar 26, 2020

I think whitespace sensitivity is bad. Also this may break the pretty printing of expressions as tokenstreams (proc_macro::TokenStream::to_string()) as I believe the pretty printing inserts a space between every token.

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2020

@matklad:

Generally, I see two different ways how we can achieve this behavior:

  • tweak the lexing (like this PR does), using a rule like "it is not an FP literal, if preceded by dot"
  • tweak the parsing (like the original recovery did), using a rule like "a field name can be an ident, and int, or a float with a specific text".

I personally would lean towards the second approach, as it will produce less "false positives".

Some reasons I've leaned toward the first approach:

  • I do not want $expr.$float to work in a macro. In other words I really do see failure to recognize tuple.0.0 as a lexer bug, not as "we should make floats a valid two-level index." There should not be floats involved in the first place!

  • This implementation cleanly and correctly relexes the remainder of the content after the dot. For example in tuple.0.0b0e0 we naturally get the lexer's existing handling of the "binary float literal is not supported" error on the 0b0e0 float literal. Matching this behavior in the parser after having treated b0e0 as a "suffix" on a 0.0 float token would be more complicated and require duplicating the lexer's checks.

  • I'm not sure what you mean by false positives. Could you give an example of a problematic false positive? A false positive would be something that is considered a tuple index when it shouldn't be. That seems unlikely to arise in practice.

In any case, if the lang team concludes that tuple.0.0 is supposed to be accepted, I'm sure we can agree on one or the other for the implementation.

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2020

@bjorn3:

I think whitespace sensitivity is bad.

That's fine, but this PR is no more whitespace sensitive than 0.0 vs 0 . 0 is whitespace sensitive.

Furthermore, you'll love that this PR makes things less whitespace sensitive, such as in the case of tuple.0.0 vs tuple.0 .0 currently being whitespace sensitive but not after this PR. I think this PR is an improvement on the whitespace sensitivity axis.


Also this may break the pretty printing of expressions as tokenstreams (proc_macro::TokenStream::to_string()) as I believe the pretty printing inserts a space between every token.

They would come out as tuple . 0 . 0 which is fine and expected.

@petrochenkov
Copy link
Contributor

This reminds me of our treatment of punctuation in proc macros.
&& is not token::AndAnd, but two tokens token::And where the first one has the "joint" flag.
Parser can then treat joint &s as a logical AND.

I remember some people (@eddyb at least?) regretted that we didn't break up floats in the same way as multi-character operators while proc macros were unstable, and didn't move the treatment of floats entirely to the parser.
(I believe this came up recently in some @rust-lang/wg-grammar discussion.)

In rustc_parse we haven't migrated to the proc macro model with simple joint and non-joint tokens yet (but we plan to).
So right now, when we want token::And, but encounter a token::AndAnd, we use fn break_and_eat to break it into two tokens on the fly.
I think it would be reasonable to do the same thing with floats. Want a token::Integer, but encountered a token::Float? Break it into multiple tokens.
The @matklad's "second way" more or less.

(Perhaps later we'll be able to migrate floats to the "simple token" model internally and expose them as a single token only to proc macros for compatibility. Or maybe not, depending on convenience mostly.)

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 26, 2020
@petrochenkov
Copy link
Contributor

With the scheme from my comment above $expr.$float won't work because $float is not a float token, it's an "interpolated" token, it basically have invisible parentheses around it.

@Centril
Copy link
Contributor

Centril commented Mar 26, 2020

We briefly discussed this on the language team meeting, but @petrochenkov's comment wasn't there yet, so I think we'll need some more discussion on the next one. :)

@Centril Centril removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 1, 2020
@Centril
Copy link
Contributor

Centril commented Apr 2, 2020

We discussed this again in this week's language team meeting taking @petrochenkov's notes into account this time. Some conclusions from the meeting:

  • Of those present, no one objected to the idea of accepting this in some fashion (though it will need to go through FCP to confirm and to take community input into consideration, as usual).

  • Niko noted that if we were do to it by changing the lexer, as currently done in this PR (and as alluded to by @petrochenkov re. "compatibility"), there could be procedural macros observing the difference in the tokens they receive. It would be unlikely, but it would be breaking.

  • We would like to see the suggested strategy (Accept tuple.0.0 as tuple indexing #70420 (comment)) of breaking float tokens when an integer is wanted explored so that we can compare them and make more informed decisions.

@Centril Centril removed I-nominated S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Apr 2, 2020
@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 14, 2020
@Dylan-DPC-zz
Copy link

@dtolnay any updates?

@dtolnay
Copy link
Member Author

dtolnay commented Apr 14, 2020

I haven't gotten a chance to try the alternative implementation suggested in #70420 (comment). Maybe @Centril or someone else already familiar with the parser would like to give it a shot?

@petrochenkov
Copy link
Contributor

I can try implementing it this weekend.

@petrochenkov
Copy link
Contributor

The alternative implemenation - #71322.

@petrochenkov petrochenkov added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 19, 2020
@dtolnay dtolnay added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Apr 19, 2020
@dtolnay
Copy link
Member Author

dtolnay commented Apr 30, 2020

Based on #71322 (comment), closing in favor of #71322. Thanks @petrochenkov for the nicer approach and implementation!

@dtolnay dtolnay closed this Apr 30, 2020
@dtolnay dtolnay deleted the lex branch May 23, 2020 21:26
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 10, 2020
Accept tuple.0.0 as tuple indexing (take 2)

If we expect something identifier-like when parsing a field name after `.`, but encounter a float token, we break that float token into parts, similarly to how we break `&&` into `&` `&`, or `<<` into `<` `<`, etc.

An alternative to rust-lang#70420.
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 11, 2020
Accept tuple.0.0 as tuple indexing (take 2)

If we expect something identifier-like when parsing a field name after `.`, but encounter a float token, we break that float token into parts, similarly to how we break `&&` into `&` `&`, or `<<` into `<` `<`, etc.

An alternative to rust-lang#70420.
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 11, 2020
Accept tuple.0.0 as tuple indexing (take 2)

If we expect something identifier-like when parsing a field name after `.`, but encounter a float token, we break that float token into parts, similarly to how we break `&&` into `&` `&`, or `<<` into `<` `<`, etc.

An alternative to rust-lang#70420.
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 11, 2020
Accept tuple.0.0 as tuple indexing (take 2)

If we expect something identifier-like when parsing a field name after `.`, but encounter a float token, we break that float token into parts, similarly to how we break `&&` into `&` `&`, or `<<` into `<` `<`, etc.

An alternative to rust-lang#70420.
mohammadfawaz added a commit to essential-contributions/pint that referenced this pull request Jul 12, 2023
Closes #72 

Fairly simple change:
- Tuple types
- Tuple expressions
- Tuple indexing expressions. Error out on invalid indices such as `0xa`
or anything that is not a `usize`.

Note that `t.0.0` does not currently work, but `t.0 .0`. Once we
implement #66, we can then write `(t.0).0`. In the future, we can
support `t.0.0` which can be done in two ways:
- Special case the lexer to _not_ parse the `0.0` as a real in the case
of a tuple access (this means the lexer now has to concern itself with
the context).
- Break the real `0.0` apart in the pasrer, which kinda what Rust does
(see rust-lang/rust#71322 after an attempt for
the other method in rust-lang/rust#70420)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants