Unable to parse expression `a as usize < b` #22644

defuz · 2015-02-21T20:03:03Z

It seems like the parser tries to read the type where it isn't:

fn main() {
    let a : u32 = 0;
    let b : usize = 0;

    a as usize > b; // ok
    a as usize < b; // error: expected one of `(`, `+`, `,`, `::`, `<`, or `>`, found `;`
}

I think it's rather strange that the operator > works, but the operator < doesn't at the same place.

rustc 1.0.0-nightly (522d09dfe 2015-02-19) (built 2015-02-20)

The text was updated successfully, but these errors were encountered:

kmcallister · 2015-02-21T20:12:36Z

usize<T> is a syntactically valid (though nonsensical) type-expression. This ambiguity on < has plagued C++ for decades :(

kmcallister · 2015-02-21T20:13:26Z

Maybe we can at least detect the ambiguous cases and provide a hint.

defuz · 2015-02-21T20:42:16Z

At least Scala, Nemerle, Nim and Boo began to use square brackets for type parameters. Is there any reason why Rust can not do the same? I think it would significant simplify parsing and get rid of ambiguities.

kmcallister · 2015-02-21T20:49:44Z

That is my preference as well, but the discussion already happened and a decision was made (before my time, I believe).

huonw · 2015-02-21T23:04:13Z

We use [] for array indexing so there are still ambiguities.

defuz · 2015-02-22T12:38:45Z

@huonw, actually no, because unlike < and > characters, square brackets (and parens) always respect the pairing. The ambiguity lies in using < and > like brackets, although they also can be used separately.

huonw · 2015-02-22T12:59:15Z

I don't thinks that's totally the reason.

The token sequence x as T[y] has two options for parses with [] as the generic syntax: an index expression (x as T)[y] or a cast to a generic type x as (T[y]). This is essentially the same as what is occurring here, the token sequence x as T < y represents a prefix of two options for parses: a comparison (x as T) < y and a cast to a generic type x as (T<y ... >). You're wanting the former but the compiler is giving the latter.

Of course, it is probably rarer that one wishes to index a cast expression, so using [] for generics may reduce how often this ambiguity occurs in real code, but it definitely does not get rid of it.

In any case, the generic syntax is here to stay. There's been more than enough discussion about it (e.g. rust-lang/rfcs#148).

bombless · 2015-02-22T13:22:07Z

Is x as (T<y ...>) actually available here?
I guess we only accept x as (&T<y ...>) for now? (T is trait here)

If we don't accept x as (T<y ...>) as legal expression (yet), I think we can parse it as OP wishes, and if we accept something like x as (T<y ...>) as valid expression one day, we just write that parenthesis out.

I'd love to write a PoC for this.
Correct me if I miss anything.

And if my idea's totally wrong, I guess I'll find out soon. (like in make check or even earlier stage)

defuz · 2015-02-22T14:28:39Z

The question of how to properly parsing sequence x as T[y ... is a question about the operator precedence. Indexing is weaker than the as operator, so there is only one way of the correct parsing it: x as (T[y ...). In the other hand, the question of parsing x as T<y ... is a question about what is < at all. It could be a start of generic type parameters or just comparison operator. In this case parser can't determine what is < and it is the reason of ambiguity.

bombless · 2015-02-22T14:44:28Z

At least I don't see the possibility we change the syntax on 1.x

dgrunwald · 2015-02-22T16:56:58Z

The ambiguity on < also exists in C++ and C#, and doesn't cause such problems in those languages. (although the way this syntax is disambiguated is dramatically different between C++ and C#)

I believe fixing this issue means we can also drop the :: when calling generic functions: f::<T>() becomes f<T>().
Note that with sufficient lookahead, most programs are not actually ambiguous, because RFC 558 prevents the use of both < and > in an expression.
Real ambiguities arise only with multiple or nested type arguments (and possibly also with UFCS syntax, I haven't yet looked at the details of that).
f(a<b,c>(d)) could be f((a<b), (c>d)) or f(a::<b,c>(d)).
a as b < c < d >> - e could be (a as b<c<d>>) - e or (a as b) < c < (d >> (-e)) (with python-style chained comparison)

I think introducing an C#-style disambiguation rule could possibly solve this issue with only very minor breaking changes. (these were also breaking changes in C# when generics were introduced in version 2.0, and affected very little code)
I have an idea on how to do this, but I didn't write a concrete proposal because people didn't seem to like parser rules that require arbitrary lookahead. And because it would make it difficult to extend Rust's type syntax later without introducing additional minor breaking changes due to the disambiguation.

kmcallister · 2015-02-22T20:34:01Z

The ambiguity on < also exists in C++ and C#, and doesn't cause such problems in those languages.

I don't know about C#, but it's responsible for some of the ugliest special cases in C++ syntax, like

void f() {
    T::template g<int>();
}

bombless · 2015-02-22T21:13:56Z

C++ needs a symbol table for parsing.
And people feel okay about it just because C already needs symbol table for parsing.

dgrunwald · 2015-02-22T22:15:27Z

I agree that we don't want to copy C++'s mistakes here.
The C# rules are much more sane: no weird "typename"/"template" keywords for disambiguating, no need for a symbol table during parsing.

The idea is: when the parser encounters a < that is ambiguous, it looks ahead in the token stream and attempts to parse the type argument list. If the type argument list is not valid, backtrack and parse the < as less-than operator. If the type argument list is valid, look at the next token after the matching > to determine whether to parse the < as type argument list or less-than operator:

Next token signals end of expression, e.g. one of ; , ) ] }: use interpretation as type argument list.
Next token is identifier or integer literal: use interpretation as less-than operator.
Next token is { or (: use interpretation as type argument list (struct initializer / function call). (this is a breaking change for the current interpretation of f(a<b,c>(d)))
For other tokens, we'd have to think about which interpretation makes more sense. In general, tokens in FIRST(Expression) should use the interpretation as less-than operator and tokens in FOLLOW(Expression) should use the interpretation as type argument list, but quite a few tokens are in both of those sets.

Note that if this heuristic is wrong, the user can always choose force one or the other interpretation: Putting parentheses around the less-than operator (e.g. f((a<b), (c>d))) makes the interpretation as type argument list invalid and thus enforces the parse as less-than operator.
If a type argument list is desired, enforcing this depends on which context the type argument list is used in:

for a struct initializer or function call, the { or ( heuristic always pick correctly.
where type arguments are used at the end of an expression (in as expression, nullary struct, ...), the expression can be enclosed in parentheses so that > is followed by ) and the heuristic thus picks correctly.
We'd need to review all other possible places where a type argument list might appear. In general, it seems like a good idea to have the heuristic err on the side of picking the type argument list.

We can also exploit the fact that RFC 558 makes a < b > c invalid in expression context to always pick the type argument list interpretation (without looking at the next token after >) if the type argument list does not contain any comma and the > is not part of a >> token.

I think this heuristic will work well enough that stuff will "just work" in most cases.
In C# it's basically perfect and most C# programmers don't even realize there is an ambiguity -- but Rust syntax a bit more flexible and may cause more problems in this regard.

Upsides:

Fixes this issue
Allows us to remove :: for generic function calls.

Downsides:

Introduces arbitrary lookahead into the language
Complicated rule, users may get confused where the heuristic fails.
- Counterpoint: users are already confused where the current "heuristic" fails (see: this issue)
Any future extension to type syntax would expand the set of valid type argument lists, and thus might change how existing expressions are interpreted. This may make it difficult to add new type-level syntax post-1.0 without breaking existing code.
In particular, I have no idea how this would work with type-level integers.
- Counterpoint: I don't know how those would work syntactically without this idea, either -- putting a constant expression where currently a type is expected seems troublesome in general.

SimonSapin · 2015-04-14T14:59:29Z

I think #20078 is the same underlying issue.

benaryorg · 2015-09-01T16:49:41Z

I wonder why in this example the comparison is prioritized but in a line such as

thread_local!(static LOCAL:Cell<u32>=Cell::new(0));

the comparison operator is resolved first.

The above code yields:

src/main.rs:3:35: 3:37 error: no rules expected the token `>=`
src/main.rs:3 thread_local!(static TEST:Cell<u32>=Cell::new(0));
                                                ^~

arielb1 · 2015-09-01T17:11:01Z

@benaryorg

Because >= is eagerly tokenized

arielb1 · 2015-09-01T17:12:50Z

anyway, foo as Rc < fmt::Debug > is perfectly valid (and useful!) Rust. If we don't want to play games with infinite lookahead, this is how we go.

benaryorg · 2015-09-01T17:14:04Z

@arielb1 Thanks.

I guess I now have a real reason to change my coding style.

Regarding the other problem, wish you good luck fixing that….

bstrie · 2015-09-10T15:37:58Z

@dgrunwald We had this discussion back when RFC 558 was accepted and determined that such a scheme wouldn't allow us to get rid of ::<> entirely; it would only allow us to omit the :: for things that take a single type parameter (i.e. arity one) and the full ::<> form would be required for typaram lists of higher arity. The upside is that this probably constitutes the majority of ::<> usage, and so lots of code would still be cleaned up in practice. The downsides are that this would prevent us from realizing the original intent of RFC 558 (Python-style chained comparisons) while also introducing the notion of infinite lookahead into the parser (currently our lookahead is always bounded). I think it's still probably worth doing, but it was sufficiently controversial to postpone the discussion to post-1.0.

dgrunwald · 2015-09-10T17:11:08Z

@bstrie: The C#-style lookahead heuristic in my comment works for arity > 1; it just requires adding parentheses to disambiguate some rare cases (it never requires ::<>). It would be a breaking change, though I would be surprised if there's much (any?) real-world code that would be broken.

It doesn't require exploiting RFC 558 for the disambiguation. RFC 558 could theoretically be used to improve the heuristic (no need to look at the token following >), but only in the comma-free cases (arity 1 isn't sufficient, it could also be ambiguous if the single type argument is another generic type instantiation containing commas). But I don't think we should use it; adding an additional type argument to an existing generic call should always be valid.

The ugliness of ::<> was why I started looking into that part of the syntax in the first place (I kinda forgot about that when I wrote RFC 558 a few days later). If we add support for Python-style chained comparisons in the future, we should probably only add monotonic comparisons (allow a < b < c and a > b > c, but not a < b > c) to avoid conflicts with generic syntax.

The main downside of the C#-style heuristic is that future additions to the type-level syntax would be breaking changes. So I think this should be postponed until type-level integer syntax is worked out.

birkenfeld · 2016-05-02T12:11:50Z

Duplicate of #11962?

SimonSapin · 2016-05-02T12:24:21Z

Yes, though there is more discussion here.

Learn to parse `a as usize < b` Parsing `a as usize > b` always works, but `a as usize < b` was a parsing error because the parser would think the `<` started a generic type argument for `usize`. The parser now attempts to parse as before, and if a DiagnosticError is returned, try to parse again as a type with no generic arguments. If this fails, return the original `DiagnosticError`. Fix #22644.

kmcallister added A-parser Area: The parsing of Rust source code to an AST A-diagnostics Area: Messages for errors, warnings, and lints labels Feb 21, 2015

bombless mentioned this issue Feb 23, 2015

Panic: "internal compiler error: ident only path should have been covered already" #22426

Closed

bombless mentioned this issue Apr 14, 2015

ch as u32 < 0x80 syntax misinterpreted as a type parameter #20078

Closed

pmarcelll mentioned this issue Apr 18, 2016

Generic parameters VS less then comparison parsing ambiguity #33078

Closed

Mark-Simulacrum mentioned this issue May 3, 2017

Confusing error message with < binop #11962

Closed

birkenfeld mentioned this issue May 4, 2017

'as', '<' and '*' can cause a parsing error #32852

Closed

estebank mentioned this issue Jun 10, 2017

Learn to parse a as usize < b #42578

Merged

bors closed this as completed in #42578 Jun 16, 2017

estebank mentioned this issue Oct 24, 2017

Detect = -> : typo in let bindings #45452

Merged

CodaFi mentioned this issue Jul 25, 2022

Parse failure with < after a cast destination type. swiftlang/swift#60146

Closed

notJoon mentioned this issue Jan 5, 2023

False positive for unused_parens in x as (T) < y #106413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to parse expression `a as usize < b` #22644

Unable to parse expression `a as usize < b` #22644

defuz commented Feb 21, 2015

kmcallister commented Feb 21, 2015

kmcallister commented Feb 21, 2015

defuz commented Feb 21, 2015

kmcallister commented Feb 21, 2015

huonw commented Feb 21, 2015

defuz commented Feb 22, 2015

huonw commented Feb 22, 2015

bombless commented Feb 22, 2015

defuz commented Feb 22, 2015

bombless commented Feb 22, 2015

dgrunwald commented Feb 22, 2015

kmcallister commented Feb 22, 2015

bombless commented Feb 22, 2015

dgrunwald commented Feb 22, 2015

SimonSapin commented Apr 14, 2015

benaryorg commented Sep 1, 2015

arielb1 commented Sep 1, 2015

arielb1 commented Sep 1, 2015

benaryorg commented Sep 1, 2015

bstrie commented Sep 10, 2015

dgrunwald commented Sep 10, 2015

birkenfeld commented May 2, 2016

SimonSapin commented May 2, 2016

Unable to parse expression a as usize < b #22644

Unable to parse expression a as usize < b #22644

Comments

defuz commented Feb 21, 2015

kmcallister commented Feb 21, 2015

kmcallister commented Feb 21, 2015

defuz commented Feb 21, 2015

kmcallister commented Feb 21, 2015

huonw commented Feb 21, 2015

defuz commented Feb 22, 2015

huonw commented Feb 22, 2015

bombless commented Feb 22, 2015

defuz commented Feb 22, 2015

bombless commented Feb 22, 2015

dgrunwald commented Feb 22, 2015

kmcallister commented Feb 22, 2015

bombless commented Feb 22, 2015

dgrunwald commented Feb 22, 2015

SimonSapin commented Apr 14, 2015

benaryorg commented Sep 1, 2015

arielb1 commented Sep 1, 2015

arielb1 commented Sep 1, 2015

benaryorg commented Sep 1, 2015

bstrie commented Sep 10, 2015

dgrunwald commented Sep 10, 2015

birkenfeld commented May 2, 2016

SimonSapin commented May 2, 2016

Unable to parse expression `a as usize < b` #22644

Unable to parse expression `a as usize < b` #22644