Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate magic numbers from expression precedence #133603

Merged
merged 5 commits into from
Dec 2, 2024

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Nov 28, 2024

Context: see #133140.

This PR continues on backporting Syn's expression precedence design into rustc. Rustc's design used mysterious integer quantities represented variously as i8 or usize (e.g. PREC_CLOSURE = -40i8), a special significance around 0 that is never named, and an extra PREC_FORCE_PAREN precedence level that does not correspond to any expression. Syn's design uses a C-like enum with variants that clearly correspond to specific sets of expression kinds.

This PR is a refactoring that has no intended behavior change on its own, but it unblocks other precedence work that rustc's precedence design was poorly suited to accommodate.

  • Asymmetrical precedence, so that a pretty-printer can tell (return 1) + 1 needs parens but 1 + return 1 does not.

  • Squashing the Closure and Jump cases into a single precedence level.

  • Numerous remaining false positives and false negatives in rustc pretty-printer's parenthesization of macro metavariables, for example in $e < rhs where $e is lhs as Thing<T>.

FYI @fmease — you don't need to review if rustbot picks someone else, but you mentioned being interested in the followup PRs.

@rustbot
Copy link
Collaborator

rustbot commented Nov 28, 2024

r? @lcnr

rustbot has assigned @lcnr.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 28, 2024
@rustbot
Copy link
Collaborator

rustbot commented Nov 28, 2024

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@rust-log-analyzer

This comment has been minimized.

@jieyouxu jieyouxu added the A-parser Area: The parsing of Rust source code to an AST label Nov 29, 2024
@rust-log-analyzer

This comment has been minimized.

@dtolnay dtolnay added the A-pretty Area: Pretty printing (including `-Z unpretty`) label Nov 29, 2024
Copy link
Contributor

@lcnr lcnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks correct to me and commits 3-7 are a clear improvement imo

Why do you think that entirely removing print_expr_maybe_paren is worth it? I feel like it's easier to read then the alternative. Removing it likely made the next commits easier, is that the main reason?

@bors
Copy link
Contributor

bors commented Nov 29, 2024

☔ The latest upstream changes (presumably #133588) made this pull request unmergeable. Please resolve the merge conflicts.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 30, 2024

I moved commits 1-2 to #133655 with a longer explanation. I can drop those commits if you prefer to keep print_expr_maybe_paren, or we can defer it until after seeing some of the rest of this work like the asymmetrical precedence changes.

RalfJung added a commit to RalfJung/rust that referenced this pull request Nov 30, 2024
Eliminate print_expr_maybe_paren function from pretty printers

This PR is part of backporting Syn's expression precedence design into rustc. (See rust-lang#133603 for other work on this.)

In Syn, our version of `print_expr_cond_paren` is called `print_subexpression` and it is called from 19 places. Of those calls, 12 of them need a "custom" behavior for the `needs_paren` argument, whereas only 7 use a "standard" behavior resembling `print_subexpression($e, $e.precedence() < Precedence::$Variant, ...)`. In other words the behavior that rustc_ast_pretty's `print_expr_maybe_paren` implements is actually not what you want most of the time. The current usage you see in rustc is overuse.

<details>
<summary>Aside: am I confident about the correctness of Syn's parenthesization? Yes. Click for details.</summary>

---

The behavior is constrained by the following pair of tests which both run over every Rust source file of rustc and the standard library and tools and test suites:

- To rule out **false positives**: for every expression in every source file, print the expression, parse it back, and verify that not a single new parenthesis got added. Since these are expressions parsed from source code, not macro-generated syntax trees, we know they must never need automatic parenthesis insertion. Rustc's pretty printer does not pass this.

    Pseudocode: `assert(expr == parse(print(expr)))`

- To rule out **false negatives**: for every expression in every source file, replace every Expr::Paren node in the syntax tree with just its contents, i.e. stripping the parentheses but otherwise preserving the syntax tree structure. Then print the stripped expression performing parenthesis insertion wherever needed, and reparse it. Verify that the reparsed expression has identical structure to the original, despite there being no parentheses in the original prior to printing, i.e. all the right parentheses got re-inserted by the printer to preserve the expression's structure. Rustc's pretty printer does not pass this. See dtolnay/syn#1788 which reveals multiple rustc_ast_pretty bugs.

    Pseudocode: `assert(unparenthesize(expr) == unparenthesize(parse(print(unparenthesize(expr)))))`

---
</details>

If `print_expr_maybe_paren` is usually not correct, is there harm in keeping it for the minority of cases where it is correct? I think the answer is yes and Syn doesn't use any equivalent of this helper function. The problems with it are:

- Having both `print_expr_maybe_paren` and `print_expr_cond_paren` applies counterproductive inertia against moving from the first to the second. When looking at a call site like `print_expr_maybe_paren(e, Precedence::$Variant, ...)` with parentheses not being inserted where they should be, anyone's first inclination would be to solve the bug by tweaking $Variant because that is the only knob that visibly appears in the function call. For example to pass "prec + 1", like tweaking the code to conditionally pass `Precedence::Prefix` instead of `Precedence::Cast`.

    Experience in Syn shows this is (almost?) never what you want the person to do. In a call `print_expr_cond_paren(e, e.precedence() < ExprPrecedence::$Variant, ...)` almost always the best fix involves one of:

    - Changing `e.precedence()`, e.g. to `fixup.leading_precedence(e)` and `fixup.trailing_precedence(e)` in cases of asymmetrical precedence (`(return 1) + 1` vs `1 + return 1`).

    - Changing `<` to `<=`, to handle associativity and other grammar restrictions like chained comparisons (which rustc gets wrong today).

    - Adding `||` and/or `&&` clauses to the condition.

    By using these 3 better knobs instead of $Variant, it upholds the property that any time we talk about precedence, it is always the precedence of some actual expression that our code is actively manipulating, instead of a value standing in for some imaginary precedence level that would exist between two consecutive [real levels](https://doc.rust-lang.org/1.83.0/reference/expressions.html#expression-precedence). For example consider that "`Cast` + 1" might be `Prefix` today, but only until some new Rust syntax ends up adding a level between those.

- The `print_expr_maybe_paren` call sites look shorter, but they are not clearer. For myself, a function argument that says "does this subexpression need parenthesization" is a concrete thing that is easy to think about, while a function argument that is "what is the effective precedence level associated with this subexpression's placement inside its parent expression" is abstract and tricky to even state a precise meaning for. I expect that for someone less familiar with the pretty printer working on adding a new expression kind (like postfix match, recently), having every subexpression consistently printed using `print_expr_cond_paren` will be more beneficial, for the same reason, than having `print_expr_maybe_paren` available.

r? `@lcnr`
RalfJung added a commit to RalfJung/rust that referenced this pull request Nov 30, 2024
Eliminate print_expr_maybe_paren function from pretty printers

This PR is part of backporting Syn's expression precedence design into rustc. (See rust-lang#133603 for other work on this.)

In Syn, our version of `print_expr_cond_paren` is called `print_subexpression` and it is called from 19 places. Of those calls, 12 of them need a "custom" behavior for the `needs_paren` argument, whereas only 7 use a "standard" behavior resembling `print_subexpression($e, $e.precedence() < Precedence::$Variant, ...)`. In other words the behavior that rustc_ast_pretty's `print_expr_maybe_paren` implements is actually not what you want most of the time. The current usage you see in rustc is overuse.

<details>
<summary>Aside: am I confident about the correctness of Syn's parenthesization? Yes. Click for details.</summary>

---

The behavior is constrained by the following pair of tests which both run over every Rust source file of rustc and the standard library and tools and test suites:

- To rule out **false positives**: for every expression in every source file, print the expression, parse it back, and verify that not a single new parenthesis got added. Since these are expressions parsed from source code, not macro-generated syntax trees, we know they must never need automatic parenthesis insertion. Rustc's pretty printer does not pass this.

    Pseudocode: `assert(expr == parse(print(expr)))`

- To rule out **false negatives**: for every expression in every source file, replace every Expr::Paren node in the syntax tree with just its contents, i.e. stripping the parentheses but otherwise preserving the syntax tree structure. Then print the stripped expression performing parenthesis insertion wherever needed, and reparse it. Verify that the reparsed expression has identical structure to the original, despite there being no parentheses in the original prior to printing, i.e. all the right parentheses got re-inserted by the printer to preserve the expression's structure. Rustc's pretty printer does not pass this. See dtolnay/syn#1788 which reveals multiple rustc_ast_pretty bugs.

    Pseudocode: `assert(unparenthesize(expr) == unparenthesize(parse(print(unparenthesize(expr)))))`

---
</details>

If `print_expr_maybe_paren` is usually not correct, is there harm in keeping it for the minority of cases where it is correct? I think the answer is yes and Syn doesn't use any equivalent of this helper function. The problems with it are:

- Having both `print_expr_maybe_paren` and `print_expr_cond_paren` applies counterproductive inertia against moving from the first to the second. When looking at a call site like `print_expr_maybe_paren(e, Precedence::$Variant, ...)` with parentheses not being inserted where they should be, anyone's first inclination would be to solve the bug by tweaking $Variant because that is the only knob that visibly appears in the function call. For example to pass "prec + 1", like tweaking the code to conditionally pass `Precedence::Prefix` instead of `Precedence::Cast`.

    Experience in Syn shows this is (almost?) never what you want the person to do. In a call `print_expr_cond_paren(e, e.precedence() < ExprPrecedence::$Variant, ...)` almost always the best fix involves one of:

    - Changing `e.precedence()`, e.g. to `fixup.leading_precedence(e)` and `fixup.trailing_precedence(e)` in cases of asymmetrical precedence (`(return 1) + 1` vs `1 + return 1`).

    - Changing `<` to `<=`, to handle associativity and other grammar restrictions like chained comparisons (which rustc gets wrong today).

    - Adding `||` and/or `&&` clauses to the condition.

    By using these 3 better knobs instead of $Variant, it upholds the property that any time we talk about precedence, it is always the precedence of some actual expression that our code is actively manipulating, instead of a value standing in for some imaginary precedence level that would exist between two consecutive [real levels](https://doc.rust-lang.org/1.83.0/reference/expressions.html#expression-precedence). For example consider that "`Cast` + 1" might be `Prefix` today, but only until some new Rust syntax ends up adding a level between those.

- The `print_expr_maybe_paren` call sites look shorter, but they are not clearer. For myself, a function argument that says "does this subexpression need parenthesization" is a concrete thing that is easy to think about, while a function argument that is "what is the effective precedence level associated with this subexpression's placement inside its parent expression" is abstract and tricky to even state a precise meaning for. I expect that for someone less familiar with the pretty printer working on adding a new expression kind (like postfix match, recently), having every subexpression consistently printed using `print_expr_cond_paren` will be more beneficial, for the same reason, than having `print_expr_maybe_paren` available.

r? ``@lcnr``
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 30, 2024
Rollup merge of rust-lang#133655 - dtolnay:maybeparen, r=lcnr

Eliminate print_expr_maybe_paren function from pretty printers

This PR is part of backporting Syn's expression precedence design into rustc. (See rust-lang#133603 for other work on this.)

In Syn, our version of `print_expr_cond_paren` is called `print_subexpression` and it is called from 19 places. Of those calls, 12 of them need a "custom" behavior for the `needs_paren` argument, whereas only 7 use a "standard" behavior resembling `print_subexpression($e, $e.precedence() < Precedence::$Variant, ...)`. In other words the behavior that rustc_ast_pretty's `print_expr_maybe_paren` implements is actually not what you want most of the time. The current usage you see in rustc is overuse.

<details>
<summary>Aside: am I confident about the correctness of Syn's parenthesization? Yes. Click for details.</summary>

---

The behavior is constrained by the following pair of tests which both run over every Rust source file of rustc and the standard library and tools and test suites:

- To rule out **false positives**: for every expression in every source file, print the expression, parse it back, and verify that not a single new parenthesis got added. Since these are expressions parsed from source code, not macro-generated syntax trees, we know they must never need automatic parenthesis insertion. Rustc's pretty printer does not pass this.

    Pseudocode: `assert(expr == parse(print(expr)))`

- To rule out **false negatives**: for every expression in every source file, replace every Expr::Paren node in the syntax tree with just its contents, i.e. stripping the parentheses but otherwise preserving the syntax tree structure. Then print the stripped expression performing parenthesis insertion wherever needed, and reparse it. Verify that the reparsed expression has identical structure to the original, despite there being no parentheses in the original prior to printing, i.e. all the right parentheses got re-inserted by the printer to preserve the expression's structure. Rustc's pretty printer does not pass this. See dtolnay/syn#1788 which reveals multiple rustc_ast_pretty bugs.

    Pseudocode: `assert(unparenthesize(expr) == unparenthesize(parse(print(unparenthesize(expr)))))`

---
</details>

If `print_expr_maybe_paren` is usually not correct, is there harm in keeping it for the minority of cases where it is correct? I think the answer is yes and Syn doesn't use any equivalent of this helper function. The problems with it are:

- Having both `print_expr_maybe_paren` and `print_expr_cond_paren` applies counterproductive inertia against moving from the first to the second. When looking at a call site like `print_expr_maybe_paren(e, Precedence::$Variant, ...)` with parentheses not being inserted where they should be, anyone's first inclination would be to solve the bug by tweaking $Variant because that is the only knob that visibly appears in the function call. For example to pass "prec + 1", like tweaking the code to conditionally pass `Precedence::Prefix` instead of `Precedence::Cast`.

    Experience in Syn shows this is (almost?) never what you want the person to do. In a call `print_expr_cond_paren(e, e.precedence() < ExprPrecedence::$Variant, ...)` almost always the best fix involves one of:

    - Changing `e.precedence()`, e.g. to `fixup.leading_precedence(e)` and `fixup.trailing_precedence(e)` in cases of asymmetrical precedence (`(return 1) + 1` vs `1 + return 1`).

    - Changing `<` to `<=`, to handle associativity and other grammar restrictions like chained comparisons (which rustc gets wrong today).

    - Adding `||` and/or `&&` clauses to the condition.

    By using these 3 better knobs instead of $Variant, it upholds the property that any time we talk about precedence, it is always the precedence of some actual expression that our code is actively manipulating, instead of a value standing in for some imaginary precedence level that would exist between two consecutive [real levels](https://doc.rust-lang.org/1.83.0/reference/expressions.html#expression-precedence). For example consider that "`Cast` + 1" might be `Prefix` today, but only until some new Rust syntax ends up adding a level between those.

- The `print_expr_maybe_paren` call sites look shorter, but they are not clearer. For myself, a function argument that says "does this subexpression need parenthesization" is a concrete thing that is easy to think about, while a function argument that is "what is the effective precedence level associated with this subexpression's placement inside its parent expression" is abstract and tricky to even state a precise meaning for. I expect that for someone less familiar with the pretty printer working on adding a new expression kind (like postfix match, recently), having every subexpression consistently printed using `print_expr_cond_paren` will be more beneficial, for the same reason, than having `print_expr_maybe_paren` available.

r? ``@lcnr``
@lcnr
Copy link
Contributor

lcnr commented Dec 2, 2024

@bors r+ rollup

ps: i'd have been totally happy to just merge this as a single PR after you gave the explanation in #133655, in case you felt like this wasn't the case

@bors
Copy link
Contributor

bors commented Dec 2, 2024

📌 Commit 7ced18f has been approved by lcnr

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 2, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 2, 2024
…llaumeGomez

Rollup of 13 pull requests

Successful merges:

 - rust-lang#133603 (Eliminate magic numbers from expression precedence)
 - rust-lang#133715 (rustdoc-json: Include safety of `static`s)
 - rust-lang#133721 (rustdoc-json: Add test for `impl Trait for dyn Trait`)
 - rust-lang#133725 (Remove `//@ compare-output-lines-by-subset`)
 - rust-lang#133730 (Add pretty-printer parenthesis insertion test)
 - rust-lang#133736 (Add `needs-target-has-atomic` directive)
 - rust-lang#133739 (Re-add myself to rotation)
 - rust-lang#133743 (Fix docs for `<[T]>::as_array`.)
 - rust-lang#133744 (Fix typo README.md)
 - rust-lang#133745 (Remove static HashSet for default IDs list)
 - rust-lang#133749 (mir validator: don't store mir phase)
 - rust-lang#133751 (remove `Ty::is_copy_modulo_regions`)
 - rust-lang#133757 (`impl Default for EarlyDiagCtxt`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 7dd0c83 into rust-lang:master Dec 2, 2024
6 checks passed
@rustbot rustbot added this to the 1.85.0 milestone Dec 2, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Dec 2, 2024
Rollup merge of rust-lang#133603 - dtolnay:precedence, r=lcnr

Eliminate magic numbers from expression precedence

Context: see rust-lang#133140.

This PR continues on backporting Syn's expression precedence design into rustc. Rustc's design used mysterious integer quantities represented variously as `i8` or `usize` (e.g. `PREC_CLOSURE = -40i8`), a special significance around `0` that is never named, and an extra `PREC_FORCE_PAREN` precedence level that does not correspond to any expression. Syn's design uses a C-like enum with variants that clearly correspond to specific sets of expression kinds.

This PR is a refactoring that has no intended behavior change on its own, but it unblocks other precedence work that rustc's precedence design was poorly suited to accommodate.

- Asymmetrical precedence, so that a pretty-printer can tell `(return 1) + 1` needs parens but `1 + return 1` does not.

- Squashing the `Closure` and `Jump` cases into a single precedence level.

- Numerous remaining false positives and false negatives in rustc pretty-printer's parenthesization of macro metavariables, for example in `$e < rhs` where $e is `lhs as Thing<T>`.

FYI `@fmease` &mdash; you don't need to review if rustbot picks someone else, but you mentioned being interested in the followup PRs.
flip1995 pushed a commit to flip1995/rust that referenced this pull request Dec 15, 2024
Eliminate magic numbers from expression precedence

Context: see rust-lang#133140.

This PR continues on backporting Syn's expression precedence design into rustc. Rustc's design used mysterious integer quantities represented variously as `i8` or `usize` (e.g. `PREC_CLOSURE = -40i8`), a special significance around `0` that is never named, and an extra `PREC_FORCE_PAREN` precedence level that does not correspond to any expression. Syn's design uses a C-like enum with variants that clearly correspond to specific sets of expression kinds.

This PR is a refactoring that has no intended behavior change on its own, but it unblocks other precedence work that rustc's precedence design was poorly suited to accommodate.

- Asymmetrical precedence, so that a pretty-printer can tell `(return 1) + 1` needs parens but `1 + return 1` does not.

- Squashing the `Closure` and `Jump` cases into a single precedence level.

- Numerous remaining false positives and false negatives in rustc pretty-printer's parenthesization of macro metavariables, for example in `$e < rhs` where $e is `lhs as Thing<T>`.

FYI `@fmease` &mdash; you don't need to review if rustbot picks someone else, but you mentioned being interested in the followup PRs.
@dtolnay dtolnay deleted the precedence branch December 16, 2024 17:33
@dtolnay dtolnay removed the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST A-pretty Area: Pretty printing (including `-Z unpretty`) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants