Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce unnecessary escaping in proc_macro::Literal::character/string #95343

Merged
merged 2 commits into from
Apr 4, 2022

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Mar 26, 2022

I noticed that https://doc.rust-lang.org/proc_macro/struct.Literal.html#method.character is producing unreadable literals that make macro-expanded code unnecessarily hard to read. Since the proc macro server was using escape_unicode(), every char is escaped using \u{…} regardless of whether there is any need to do so. For example Literal::character('=') would previously produce '\u{3d}' which unnecessarily obscures the meaning when reading the macro-expanded code.

I've changed Literal::string also in this PR because str's Debug impl is also smarter than just calling escape_debug on every char. For example Literal::string("ferris's") would previously produce "ferris\'s" but will now produce "ferris's".

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 26, 2022
@rust-highfive
Copy link
Collaborator

r? @lcnr

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 26, 2022
@petrochenkov
Copy link
Contributor

cc #60495

@lcnr
Copy link
Contributor

lcnr commented Mar 28, 2022

don't know the areas actually impacted by this change, so

r? @petrochenkov

@rust-highfive rust-highfive assigned petrochenkov and unassigned lcnr Mar 28, 2022
@petrochenkov
Copy link
Contributor

I'd like to use this opportunity to address #60495 (maybe not to make all the relevant code changes, but at least to make a decision).
Updated status of literal escaping in the compiler - #60495 (comment).

  • Question 1: Why we are doing the escaping at all?
  • Question 2: What are the requirements for this escaping?
    • Apparently the only strong requirement is that the roundtrip through the rustc parser's unescaping should work?
    • From this PR it looks like readability of the output is also a soft requirement.
      It means that we should probably prefer debug escaping everywhere, because escape_unicode is entirely unreadable, and escape_default is less readable.
  • Question 3 (unanswered): Why on earth Debug impl for str and escape_debug do different things?
    • Which one of these is the canonical debug escaping? I assume that here we should also make a choice and use it consistently?

@dtolnay
Could you comment on these questions?

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 3, 2022
@dtolnay
Copy link
Member Author

dtolnay commented Apr 3, 2022

The alternative is to produce a raw string token instead of doing escaping and producing a cooked string token.

I don't have a preference between these. They both come out equally readable in typical usage, from what I've seen.

The fact that proc macros and macro_rules do the opposite is a good catch. Here is what I tried:

// main.rs

macro_rules! decl {
    (#[doc = $tt:tt]) => {
        println!("decl macro: #[doc = {}]", stringify!($tt));
    };
}

fn main() {
    decl! {
        ///"
    }
    proc! {
        ///"
    }
}
// lib.rs

#[proc_macro]
pub fn proc(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    if let proc_macro::TokenTree::Group(group) = input.into_iter().nth(1).unwrap() {
        println!("proc macro: #[doc = {}]", group.stream().into_iter().nth(2).unwrap());
    }
    proc_macro::TokenStream::new()
}
proc macro: #[doc = "\""]
decl macro: #[doc = r#"""#]

I think it would be reasonable to make these two behave the same. It's not important to me which behavior we pick.


What are the requirements for this escaping?

I think you got it right in your response to this above.


Why on earth Debug impl for str and escape_debug do different things?

From what I can tell, this divergence was introduced by #83079.

The current behavior of Debug is the newer/better formatting, while escape_debug is the older. I think we should update escape_debug to change to the new behavior where ' does not get a backslash.

@petrochenkov
Copy link
Contributor

the only strong requirement is that the roundtrip through the rustc parser's unescaping should work

So one more alternative is to introduce a new escape_minimal_for_rust_literal doing only the escaping that is required for the roundtrip to work.
That's would be pretty much \, " or ' depending on the literal kind, and maybe also bare CR, need to check the lexer code.

At least it will be predictable in that case, unlike debug escaping.
It would also be readable enough because no unicode would be mangled (maybe with some whitespace parts being less readable than with debug escaping).

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 4, 2022
@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 4, 2022

Ok, I think this is good to go.
I'll unify the remaining cases in the compiler myself, and then maybe try to migrate from debug escaping to escape_minimal_for_rust_literal after that.
@bors r+

@bors
Copy link
Contributor

bors commented Apr 4, 2022

📌 Commit f383134 has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 4, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 4, 2022
Rollup of 7 pull requests

Successful merges:

 - rust-lang#92942 (stabilize windows_process_extensions_raw_arg)
 - rust-lang#94817 (Release notes for 1.60.0)
 - rust-lang#95343 (Reduce unnecessary escaping in proc_macro::Literal::character/string)
 - rust-lang#95431 (Stabilize total_cmp)
 - rust-lang#95438 (Add SyncUnsafeCell.)
 - rust-lang#95467 (Windows: Synchronize asynchronous pipe reads and writes)
 - rust-lang#95609 (Suggest borrowing when trying to coerce unsized type into `dyn Trait`)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 2d1496a into rust-lang:master Apr 4, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 4, 2022
@dtolnay dtolnay deleted the literals branch April 13, 2022 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants