Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non_ascii_literal should not be machine-applicable to raw strings #9805

Open
alexjago opened this issue Nov 6, 2022 · 10 comments
Open

non_ascii_literal should not be machine-applicable to raw strings #9805

alexjago opened this issue Nov 6, 2022 · 10 comments
Labels
C-bug Category: Clippy is not doing the correct thing I-false-positive Issue: The lint was triggered on code it shouldn't have

Comments

@alexjago
Copy link

alexjago commented Nov 6, 2022

Summary

non_ascii_literal is machine-applied by substituting unicode escapes. However, those are incompatible with raw strings.

(And if like me you try using raw strings as formatting targets, you'll then get errors about invalid references to positional arguments.)

Reproducer

This code should trigger the lint:

format!(r#"a raw string with a range {}–{} (and an en dash)"#, 31, 0);

cargo clippy --fix reported the following:

The following errors were reported:
error: invalid reference to positional argument 2013 (there are (...) arguments)
   --> src/filename.rs:(line):(col)
    |
line |         format!(r#"a raw string with a range {}\u{2013}{} (and an en dash)"#, 31, 0);

Version

rustc 1.62.1 (e092d0b6b 2022-07-16)
binary: rustc
commit-hash: e092d0b6b43f2de967af0887873151bb1c0b18d3
commit-date: 2022-07-16
host: x86_64-apple-darwin
release: 1.62.1
LLVM version: 14.0.5

Additional Labels

@rustbot +label I-suggestion-causes-error I-false-positive

@alexjago alexjago added the C-bug Category: Clippy is not doing the correct thing label Nov 6, 2022
@xFrednet xFrednet added the I-false-positive Issue: The lint was triggered on code it shouldn't have label Nov 6, 2022
@xFrednet
Copy link
Member

xFrednet commented Nov 6, 2022

I believe avoiding the false positive for raw strings is enough, as there would be no suggestion in that case. 🙃

@rustbot label +good-first-issue

@rustbot rustbot added the good-first-issue These issues are a good way to get started with Clippy label Nov 6, 2022
@alexjago
Copy link
Author

alexjago commented Nov 6, 2022

That'd be the easiest fix but there's a few things going on as I see them:

First is whether or not to report for raw strings - maybe someone is trying to enforce an all-ASCII codebase with this lint.

Second, if it is still reported, there could be a hint along the lines of "Clippy can replace non-ASCII characters with substitutions, except for raw strings".

I suppose there's also a potential restriction lint which would convert raw strings to normal strings - in which case a subsequent fix could apply the unicode substitution.

I went looking and wow, this goes back to #85 and #299, looks like the original basis of this lint was trying to do something about zero-width spaces and visually similar unicode characters.

@xFrednet
Copy link
Member

xFrednet commented Nov 7, 2022

Good point, in that case it's probably the best to make the suggestion not machine applicable if it's a raw string. And add a note to the lint description. We could potentially also add a config value, but that might be overkill. Thank you for the comment :)

@alexjago
Copy link
Author

alexjago commented Nov 8, 2022

I've been having a read through https://github.com/rust-lang/rust-clippy/blob/master/clippy_lints/src/unicode.rs

The lint invisible_characters experiences the same issue (machine-applied fix on raw strings), except that it's deny-by-default.

As for unicode_not_nfc, it's OK (i.e. not escaped) without non_ascii_literal . (If both are active, then the suggested non_ascii_literal replacement is in NFC form - this is fine too.)

I've got a minimal repo up here: https://github.com/alexjago/clippy_unicode_tests

@tylerjw
Copy link
Contributor

tylerjw commented Jan 24, 2023

I tried getting started on working on this and wrote this function, used your examples in the testfile unicode.rs, however it doesn't seem to work.

fn is_raw_str(expr: &'_ Expr<'_>) -> bool {
    if let ExprKind::Lit(ref lit) = expr.kind {
        if let LitKind::Str(_, style) = lit.node {
            if matches!(style, StrStyle::Raw(_)) {
                return true;
            }
        }
    }
    false
}

Do you know why this function might not work on the example raw string literals?

@xFrednet
Copy link
Member

Hmmm, I'm sadly not an expert, when it comes to the format! macro. You could try attaching a #[clippy::dump] attribute (then run Clippy) and look at the output.

@tylerjw
Copy link
Contributor

tylerjw commented Jan 29, 2023

Your intuition is good here. Something about the format! macro splits this string into an Array object of Cooked string literals (not raw strings). Here is part of the output from #[clippy::dump]: playground link

kind: Array(
    [
        Expr {
            hir_id: HirId(DefId(0:3 ~ playground[89a1]::main).10),
            kind: Lit(
                Spanned {
                    node: Str(
                        "a raw string with a range ",
                        Cooked,
                    ),
                    span: src/main.rs:5:21: 5:74 (#0),
                },
            ),
            span: src/main.rs:5:21: 5:74 (#0),
        },
        Expr {
            hir_id: HirId(DefId(0:3 ~ playground[89a1]::main).11),
            kind: Lit(
                Spanned {
                    node: Str(
                        "–",
                        Cooked,
                    ),
                    span: src/main.rs:5:21: 5:74 (#0),
                },
            ),
            span: src/main.rs:5:21: 5:74 (#0),
        },
        Expr {
            hir_id: HirId(DefId(0:3 ~ playground[89a1]::main).12),
            kind: Lit(
                Spanned {
                    node: Str(
                        " (and an en dash)",
                        Cooked,
                    ),
                    span: src/main.rs:5:21: 5:74 (#0),
                },
            ),
            span: src/main.rs:5:21: 5:74 (#0),
        },
    ],
),

If I do this without the format link and try on a raw string literal itself, the code above works as expected. Do you have an opinion about what I should do next about this?

@xFrednet
Copy link
Member

Hey, @nyurik and @Alexendoo, I believe you're the best format! macro wizards in Clippy rn. Do you maybe have an idea how to fix this issue? Since the strings are Cooked, I'm not sure we can find a better way, then snipping the code and checking for a r#*" snippet 🤔

@nyurik
Copy link
Contributor

nyurik commented Jan 30, 2023

I'm more of an apprentice :) Note that just a few days ago rust-lang/rust#106745 was merged - which will require a significant rework of all the format-related lint code, so the above might no longer be relevant...

@Alexendoo Alexendoo removed the good-first-issue These issues are a good way to get started with Clippy label Jan 30, 2023
@Alexendoo
Copy link
Member

Tricky one, It looks like the lint does take a snippet to examine already, so checking if that starts with r would be a good way to bypass the issue of them being Cooked in the HIR

The suggestion to still lint raw strings but change the suggestion to mention it'd have to be converted to a normal string sounds good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Clippy is not doing the correct thing I-false-positive Issue: The lint was triggered on code it shouldn't have
Projects
None yet
Development

No branches or pull requests

6 participants