Accept tuple.0.0 as tuple indexing #70420

dtolnay · 2020-03-26T08:03:27Z

Previously rustc would reject tuple.0.0, requiring the programmer to write either tuple.0 .0 or (tuple.0).0 to work around the limitation because the 0.0 is emitted as one float token by the lexer.

This PR adjusts the lexer to produce IdentDotIntegerDotInteger for the above, instead of IdentDotFloat, thereby making tuple.0.0 work without a parser change.

Relevant issue: Tuple indexing should handle float tokens specially #28257
Tuple indexing RFC: RFC: Tuple indexing rfcs#184
RFC: Tuple indexing rfcs#184 (comment) (2014) says Niko preferred not to have this at the time, but that we can consider supporting it down the line. Tuple indexing should handle float tokens specially #28257 (comment) (2015) says Niko considers it may have been the wrong call to reject this originally.

r? @Centril

FYI @nikomatsakis since your ancient comments came up. ❤️

FYI @petrochenkov and @matklad — I would be interested whether you are on board with a lexer change like this.

Centril · 2020-03-26T08:15:52Z

src/librustc_parse/parser/expr.rs

@@ -783,47 +780,6 @@ impl<'a> Parser<'a> {
        self.struct_span_err(self.token.span, &format!("unexpected token: `{}`", actual)).emit();
    }

-    fn recover_field_access_by_float_lit(


Happy to see this method go; it was ungreat to refactor in the past :) 🎉

Centril · 2020-03-26T08:20:07Z

From the POV of someone who is quite familiar with rustc_parse but not with rustc_lexer (so I'm not an expert in the subject matter of this PR, and so the technical reviewer should be r? @petrochenkov) this change seems quite minimal and reasonable. While it somewhat complicates the spec it doesn't seem like it does so notably, and on the other hand it simplifies the compiler internals in practical terms (diff is negative).

Assuming @petrochenkov & @matklad are happy with this I would be happy to propose FCP merging this to the language team.

Though let's also nominate it for discussion today on our language team meeting.

matklad · 2020-03-26T08:58:25Z

Am I correct that tuple. 0.0 won't work? We should add this as a test. And probably tuple./*i love special cases*/0.0 as well.

Am I correct that this change will break this program?

macro_rules! m { (.$l:literal) => (); }
m!(.0.0);

fn main() {}

It should be added as a test as well.

Another case I can think of is (((),),).000.000. I think for this one behavior before and after would be the same, but better test than sorry.

Generally, I see two different ways how we can achieve this behavior:

tweak the lexing (like this PR does), using a rule like "it is not an FP literal, if preceded by dot"
tweak the parsing (like the original recovery did), using a rule like "a field name can be an ident, and int, or a float with a specific text".

I personally would lean towards the second approach, as it will produce less "false positives". It means that the parser should be exposed to the text of the token, and not only to its lexical class, but it is already somewhat happens for contextual keywords (although this case is different, as the set of context keywords is finite, while the set of \d+.\d+ float literals is not).

Makes me wonder why we didn't choose ((),)._0 as a syntax for tuple fields in the first place :)

dtolnay · 2020-03-26T16:54:44Z

@matklad:

Am I correct that tuple. 0.0 won't work? We should add this as a test. And probably tuple./*i love special cases*/0.0 as well.

Added tests. These are now the way to write IdentDotFloat if that's what the programmer means, but that should be quite rare. It's better than requiring a workaround for IdentDotIntegerDotInteger.

Am I correct that this change will break this program?
macro_rules! m { (.$l:literal) => (); } m!(.0.0);
It should be added as a test as well.

Added test.

Another case I can think of is (((),),).000.000. I think for this one behavior before and after would be the same, but better test than sorry.

Added test.

bjorn3 · 2020-03-26T18:26:57Z

I think whitespace sensitivity is bad. Also this may break the pretty printing of expressions as tokenstreams (proc_macro::TokenStream::to_string()) as I believe the pretty printing inserts a space between every token.

dtolnay · 2020-03-26T18:38:58Z

@matklad:

Generally, I see two different ways how we can achieve this behavior:

tweak the lexing (like this PR does), using a rule like "it is not an FP literal, if preceded by dot"

tweak the parsing (like the original recovery did), using a rule like "a field name can be an ident, and int, or a float with a specific text".

I personally would lean towards the second approach, as it will produce less "false positives".

Some reasons I've leaned toward the first approach:

I do not want $expr.$float to work in a macro. In other words I really do see failure to recognize tuple.0.0 as a lexer bug, not as "we should make floats a valid two-level index." There should not be floats involved in the first place!
This implementation cleanly and correctly relexes the remainder of the content after the dot. For example in tuple.0.0b0e0 we naturally get the lexer's existing handling of the "binary float literal is not supported" error on the 0b0e0 float literal. Matching this behavior in the parser after having treated b0e0 as a "suffix" on a 0.0 float token would be more complicated and require duplicating the lexer's checks.
I'm not sure what you mean by false positives. Could you give an example of a problematic false positive? A false positive would be something that is considered a tuple index when it shouldn't be. That seems unlikely to arise in practice.

In any case, if the lang team concludes that tuple.0.0 is supposed to be accepted, I'm sure we can agree on one or the other for the implementation.

dtolnay · 2020-03-26T18:47:49Z

@bjorn3:

I think whitespace sensitivity is bad.

That's fine, but this PR is no more whitespace sensitive than 0.0 vs 0 . 0 is whitespace sensitive.

Furthermore, you'll love that this PR makes things less whitespace sensitive, such as in the case of tuple.0.0 vs tuple.0 .0 currently being whitespace sensitive but not after this PR. I think this PR is an improvement on the whitespace sensitivity axis.

Also this may break the pretty printing of expressions as tokenstreams (proc_macro::TokenStream::to_string()) as I believe the pretty printing inserts a space between every token.

They would come out as tuple . 0 . 0 which is fine and expected.

petrochenkov · 2020-03-26T19:29:19Z

This reminds me of our treatment of punctuation in proc macros.
&& is not token::AndAnd, but two tokens token::And where the first one has the "joint" flag.
Parser can then treat joint &s as a logical AND.

I remember some people (@eddyb at least?) regretted that we didn't break up floats in the same way as multi-character operators while proc macros were unstable, and didn't move the treatment of floats entirely to the parser.
(I believe this came up recently in some @rust-lang/wg-grammar discussion.)

In rustc_parse we haven't migrated to the proc macro model with simple joint and non-joint tokens yet (but we plan to).
So right now, when we want token::And, but encounter a token::AndAnd, we use fn break_and_eat to break it into two tokens on the fly.
I think it would be reasonable to do the same thing with floats. Want a token::Integer, but encountered a token::Float? Break it into multiple tokens.
The @matklad's "second way" more or less.

(Perhaps later we'll be able to migrate floats to the "simple token" model internally and expose them as a single token only to proc macros for compatibility. Or maybe not, depending on convenience mostly.)

petrochenkov · 2020-03-26T19:39:15Z

With the scheme from my comment above $expr.$float won't work because $float is not a float token, it's an "interpolated" token, it basically have invisible parentheses around it.

Centril · 2020-03-26T19:58:06Z

We briefly discussed this on the language team meeting, but @petrochenkov's comment wasn't there yet, so I think we'll need some more discussion on the next one. :)

Centril · 2020-04-02T23:34:45Z

We discussed this again in this week's language team meeting taking @petrochenkov's notes into account this time. Some conclusions from the meeting:

Of those present, no one objected to the idea of accepting this in some fashion (though it will need to go through FCP to confirm and to take community input into consideration, as usual).
Niko noted that if we were do to it by changing the lexer, as currently done in this PR (and as alluded to by @petrochenkov re. "compatibility"), there could be procedural macros observing the difference in the tokens they receive. It would be unlikely, but it would be breaking.
We would like to see the suggested strategy (Accept tuple.0.0 as tuple indexing #70420 (comment)) of breaking float tokens when an integer is wanted explored so that we can compare them and make more informed decisions.

Dylan-DPC-zz · 2020-04-14T13:16:05Z

@dtolnay any updates?

dtolnay · 2020-04-14T17:09:57Z

I haven't gotten a chance to try the alternative implementation suggested in #70420 (comment). Maybe @Centril or someone else already familiar with the parser would like to give it a shot?

petrochenkov · 2020-04-14T18:38:53Z

I can try implementing it this weekend.

petrochenkov · 2020-04-19T12:47:10Z

The alternative implemenation - #71322.

dtolnay · 2020-04-30T22:57:50Z

Based on #71322 (comment), closing in favor of #71322. Thanks @petrochenkov for the nicer approach and implementation!

Accept tuple.0.0 as tuple indexing (take 2) If we expect something identifier-like when parsing a field name after `.`, but encounter a float token, we break that float token into parts, similarly to how we break `&&` into `&` `&`, or `<<` into `<` `<`, etc. An alternative to rust-lang#70420.

Closes #72 Fairly simple change: - Tuple types - Tuple expressions - Tuple indexing expressions. Error out on invalid indices such as `0xa` or anything that is not a `usize`. Note that `t.0.0` does not currently work, but `t.0 .0`. Once we implement #66, we can then write `(t.0).0`. In the future, we can support `t.0.0` which can be done in two ways: - Special case the lexer to _not_ parse the `0.0` as a real in the case of a tuple access (this means the lexer now has to concern itself with the context). - Break the real `0.0` apart in the pasrer, which kinda what Rust does (see rust-lang/rust#71322 after an attempt for the other method in rust-lang/rust#70420)

dtolnay added 6 commits March 26, 2020 00:02

Lex tuple.0.0 as tuple indexing

12a5e49

Remove tuple-float-index ui test that now compiles

f58e485

Add test of tuple nested indexing

550b50b

Remove obsolete recovery of field access by float

ead9325

Test even deeper nested indexing

9d51f49

Test a range, which is not nested indexing

34999af

dtolnay added T-lang Relevant to the language team T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 26, 2020

rust-highfive assigned Centril Mar 26, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 26, 2020

petrochenkov self-assigned this Mar 26, 2020

Centril reviewed Mar 26, 2020

View reviewed changes

rust-highfive unassigned Centril Mar 26, 2020

Centril added the I-nominated label Mar 26, 2020

dtolnay added 3 commits March 26, 2020 09:39

Add tests in which the token really is a float

96f03d8

Add test for tuple indexed with float in macro input

22e0242

Add test for errors triggered on parts of decomposed index

ec8ff74

dtolnay mentioned this pull request Mar 29, 2020

Tuple indexing should handle float tokens specially #28257

Closed

Centril removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 1, 2020

Centril removed I-nominated S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Apr 2, 2020

CAD97 mentioned this pull request Apr 4, 2020

Lexical specification? rust-lang/wg-grammar#3

Open

joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 14, 2020

petrochenkov mentioned this pull request Apr 19, 2020

Accept tuple.0.0 as tuple indexing (take 2) #71322

Merged

petrochenkov added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 19, 2020

dtolnay added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Apr 19, 2020

dtolnay closed this Apr 30, 2020

dtolnay deleted the lex branch May 23, 2020 21:26

This was referenced Jul 11, 2023

Parse tuple types and expressions essential-contributions/pint#93

Merged

Support nested tuple indexing without parentheses t.0.0 essential-contributions/pint#100

Closed

dtolnay added A-parser Area: The lexing & parsing of Rust source code to an AST and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jan 14, 2025

Accept tuple.0.0 as tuple indexing #70420

Accept tuple.0.0 as tuple indexing #70420

Uh oh!

Conversation

dtolnay commented Mar 26, 2020

Uh oh!

Centril Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

Centril commented Mar 26, 2020

Uh oh!

matklad commented Mar 26, 2020

Uh oh!

dtolnay commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtolnay commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtolnay commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrochenkov commented Mar 26, 2020

Uh oh!

petrochenkov commented Mar 26, 2020

Uh oh!

Centril commented Mar 26, 2020

Uh oh!

Centril commented Apr 2, 2020

Uh oh!

Dylan-DPC-zz commented Apr 14, 2020

Uh oh!

dtolnay commented Apr 14, 2020

Uh oh!

petrochenkov commented Apr 14, 2020

Uh oh!

petrochenkov commented Apr 19, 2020

Uh oh!

dtolnay commented Apr 30, 2020

Uh oh!

Uh oh!

dtolnay commented Mar 26, 2020 •

edited

Loading

bjorn3 commented Mar 26, 2020 •

edited

Loading

dtolnay commented Mar 26, 2020 •

edited

Loading

dtolnay commented Mar 26, 2020 •

edited

Loading