Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NFKC normalization bug when removing unused imports #12571

Merged
merged 3 commits into from
Jul 30, 2024

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Jul 29, 2024

Fixes #12570.

Summary

Our lexer eagerly applies NFKC normalization to NKFC-confusable unicode characters; this is done to match Python's semantics at runtime, where the interpreter views these characters as the same. When removing unused imports, however, we use libCST, which doesn't apply the same normalization (it would be incorrect for libCST to do so, since it's a CST rather than an AST). This can lead to a QualifiedName constructed from a libCST node not comparing equal to a QualifiedName constructed from a ruff_python_ast node that represents the same sapn of source code as the libCST node. The difference here between the two parsers is the root cause of #12570.

This PR therefore takes care to apply NFKC normalization when constructing QualifiedNames from libCST nodes, so that these QualifiedNames are always comparable to QualifiedNames constructed from ruff_python_ast nodes.

Test plan

cargo test

Copy link
Contributor

github-actions bot commented Jul 29, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/Chat_finetuning_data_prep.ipynb:6:18:25: Unparenthesized generator expression cannot be used here
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_gmail.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_jira.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_doc.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_text.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_middleware_azure_function.ipynb:37:1:13: Simple statements must be separated by newlines or semicolons

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/Chat_finetuning_data_prep.ipynb:6:18:25: Unparenthesized generator expression cannot be used here
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_gmail.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_jira.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_doc.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_text.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_middleware_azure_function.ipynb:37:1:13: Simple statements must be separated by newlines or semicolons

@AlexWaygood AlexWaygood marked this pull request as ready for review July 29, 2024 19:23
@MichaReiser
Copy link
Member

MichaReiser commented Jul 29, 2024

Thanks for looking into this bug. This looks interesting.

It would help me review if you could update the summary and include a short summary of what the issue was/how this PR fixes the issue.

@AlexWaygood
Copy link
Member Author

Thanks for looking into this bug. This looks interesting.

It would help me review if you could update the summary and include a short summary of what the issue was/how this PR fixes the issue.

Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the change correctly, the issue is that libCST doesn't perform nfkc normalization. Is that correct?

Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow.

Linking to the analysis from the issue in the comment should be sufficient. I only want to avoid that we merge the PR with a stale summary (commit message)

crates/ruff_linter/src/fix/codemods.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

If I understand the change correctly, the issue is that libCST doesn't perform nfkc normalization. Is that correct?

Yup, that's correct. Our lexer eagerly performs NFKC normalization to match Python's semantics, whereas libCST's lexer leaves NFKC confusables untouched. I don't think that's a bug in libCST, since it's a CST rather than an AST; but it is a difference that we need to account for when building QualifiedNames from libCST trees, since a QualifiedName represents semantic information, and a QualifiedName built from a libCST node should be comparable with a QualifiedName built from a ruff_python_ast node that's built from the same source.

@AlexWaygood AlexWaygood enabled auto-merge (squash) July 30, 2024 09:49
@AlexWaygood AlexWaygood added fixes Related to suggested fixes for violations linter Related to the linter labels Jul 30, 2024
@AlexWaygood AlexWaygood merged commit aaa56eb into main Jul 30, 2024
18 checks passed
@AlexWaygood AlexWaygood deleted the alex/f401-bug-2 branch July 30, 2024 09:54
Copy link

codspeed-hq bot commented Jul 30, 2024

CodSpeed Performance Report

Merging #12571 will improve performances by ×2

Comparing alex/f401-bug-2 (ffb7fd1) with alex/f401-bug-2 (ed8e5ed)

Summary

⚡ 1 improvements
✅ 32 untouched benchmarks

Benchmarks breakdown

Benchmark alex/f401-bug-2 alex/f401-bug-2 Change
red_knot_check_file[incremental] 534 µs 264.4 µs ×2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixes Related to suggested fixes for violations linter Related to the linter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fixing file with rule F401 cause infinite loop
2 participants