[`perflint`] Add `PERF403` #5313

qdegraaf · 2023-06-22T19:32:18Z

Summary

Adds PERF403 mirroring W8403 from https://github.com/tonybaloney/perflint

Implementation is similar to original/upstream implementation, with the exception of not yet checking whether the ExprSubscript value is a dictionary type value. I tried adding this but ended up doing a lot of

let scope = checker.semantic().scope();
if let Some(binding_id) = scope.get(subscript_value_id) {
    let binding = &checker.semantic().bindings[binding_id];
    if binding.kind.is_assignment() || binding.kind.is_named_expr_assignment() {
        if let Some(parent_id) = binding.source {
        etc....

again for what felt like very little added value.

Open to any feedback/input to do this more efficiently. If there is no more efficient way to do this, also let me know and I'll see if I can make some kind of helper or util for this, as I've bumped into this a few times now with various rules.

I've set the violation to only flag the first append call in a long if-else statement. Happy to change this to some other location or make it multiple violations if that makes more sense.

Test Plan

Fixtures were added based on perflint tests

Issue Links

Refers: #4789

github-actions · 2023-06-22T19:49:15Z

PR Check Results

Ecosystem

ℹ️ ecosystem check detected changes. (+21, -0, 0 error(s))

airflow (+13, -0)

+ airflow/dag_processing/manager.py:997:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/executors/kubernetes_executor_utils.py:137:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/exasol/hooks/exasol.py:66:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/google/cloud/hooks/bigquery.py:3063:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/google/cloud/transfers/gcs_to_bigquery.py:708:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/mysql/hooks/mysql.py:163:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/postgres/hooks/postgres.py:149:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/smtp/hooks/smtp.py:282:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/utils/email.py:215:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ dev/breeze/src/airflow_breeze/utils/run_utils.py:185:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ setup.py:524:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ tests/test_utils/config.py:57:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ tests/test_utils/config.py:82:13: PERF403 Use a dictionary comprehension instead of a for-loop

bokeh (+3, -0)

+ src/bokeh/core/has_props.py:128:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ src/bokeh/core/property/wrappers.py:517:21: PERF403 Use a dictionary comprehension instead of a for-loop
+ src/bokeh/models/sources.py:283:13: PERF403 Use a dictionary comprehension instead of a for-loop

zulip (+5, -0)

+ analytics/views/stats.py:456:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ analytics/views/stats.py:531:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ zerver/lib/response.py:128:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ zerver/views/auth.py:371:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ zerver/views/development/registration.py:28:9: PERF403 Use a dictionary comprehension instead of a for-loop

Rules changed: 1

Rule	Changes	Additions	Removals
PERF403	21	21	0

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     11.0±0.02ms     3.7 MB/sec    1.02     11.2±0.04ms     3.6 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.2±0.00ms     7.5 MB/sec    1.00      2.2±0.00ms     7.5 MB/sec
formatter/numpy/globals.py                 1.00    247.0±0.31µs    11.9 MB/sec    1.07   264.3±12.88µs    11.2 MB/sec
formatter/pydantic/types.py                1.00      4.8±0.01ms     5.4 MB/sec    1.02      4.8±0.01ms     5.3 MB/sec
linter/all-rules/large/dataset.py          1.00     15.2±0.03ms     2.7 MB/sec    1.02     15.5±0.02ms     2.6 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.9±0.00ms     4.3 MB/sec    1.01      3.9±0.01ms     4.2 MB/sec
linter/all-rules/numpy/globals.py          1.00    510.7±0.68µs     5.8 MB/sec    1.01    516.2±3.26µs     5.7 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.03ms     3.7 MB/sec    1.01      7.0±0.01ms     3.7 MB/sec
linter/default-rules/large/dataset.py      1.00      7.8±0.02ms     5.2 MB/sec    1.00      7.8±0.02ms     5.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.01   1676.4±6.23µs     9.9 MB/sec    1.00   1665.9±2.47µs    10.0 MB/sec
linter/default-rules/numpy/globals.py      1.00    187.1±0.29µs    15.8 MB/sec    1.00    187.1±2.36µs    15.8 MB/sec
linter/default-rules/pydantic/types.py     1.02      3.6±0.01ms     7.2 MB/sec    1.00      3.5±0.00ms     7.3 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.8±0.08ms     3.8 MB/sec    1.01     10.9±0.07ms     3.7 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.2±0.03ms     7.7 MB/sec    1.00      2.2±0.04ms     7.7 MB/sec
formatter/numpy/globals.py                 1.00    242.9±4.10µs    12.1 MB/sec    1.02   248.8±11.57µs    11.9 MB/sec
formatter/pydantic/types.py                1.00      4.7±0.06ms     5.5 MB/sec    1.01      4.7±0.05ms     5.4 MB/sec
linter/all-rules/large/dataset.py          1.00     15.2±0.09ms     2.7 MB/sec    1.00     15.2±0.21ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      4.0±0.04ms     4.2 MB/sec    1.00      4.0±0.04ms     4.2 MB/sec
linter/all-rules/numpy/globals.py          1.00    482.5±7.00µs     6.1 MB/sec    1.01    486.3±6.21µs     6.1 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.05ms     3.7 MB/sec    1.00      6.9±0.10ms     3.7 MB/sec
linter/default-rules/large/dataset.py      1.00      7.9±0.05ms     5.1 MB/sec    1.00      7.9±0.09ms     5.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1654.6±21.59µs    10.1 MB/sec    1.00  1654.6±14.50µs    10.1 MB/sec
linter/default-rules/numpy/globals.py      1.00    187.9±2.47µs    15.7 MB/sec    1.01    189.6±3.71µs    15.6 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.5±0.03ms     7.2 MB/sec    1.00      3.5±0.03ms     7.2 MB/sec

crates/ruff/src/rules/perflint/rules/slow_dict_creation.rs

crates/ruff/src/rules/perflint/rules/manual_dict_comprehension.rs

dhruvmanila · 2023-07-11T16:40:25Z

crates/ruff/resources/test/fixtures/perflint/PERF403.py

+    for idx, name in enumerate(fruit):
+        result[idx] = name  # PERF403


(I wonder similar to PERF402 why there isn't the separation for dictionary, I'm assuming the corresponding rule for dictionary doesn't exists upstream as well).

Yeah don't know why it's split into copy() and comprehension for lists but only one for dicts. My implementation was just a straight copy from https://github.com/tonybaloney/perflint/blob/main/perflint/comprehension_checker.py

As we are already deviating with the if conditional checks, we could add a separate rule PERF404 if that has value.

crates/ruff/src/rules/perflint/rules/manual_dict_comprehension.rs

charliermarsh · 2023-07-12T03:48:23Z

The delay on this one is that I'm worried about false positives. Most of the examples surfaced in the ecosystem CI checks seem to be false positives. I don't think we have a great way to solve this right now. The best we can do, probably, is check if the variable to which we're assigning was assigned the empty dictionary immediately prior to the loop. That's probably a necessary precondition to merging here just based on what I see in the ecosystem CI -- wdyt, @qdegraaf?

qdegraaf · 2023-07-12T08:10:35Z

The delay on this one is that I'm worried about false positives. Most of the examples surfaced in the ecosystem CI checks seem to be false positives. I don't think we have a great way to solve this right now. The best we can do, probably, is check if the variable to which we're assigning was assigned the empty dictionary immediately prior to the loop. That's probably a necessary precondition to merging here just based on what I see in the ecosystem CI -- wdyt, @qdegraaf?

@charliermarsh it's been on my TODO list to look at this one again since I got wind of PERF401 and PERF402 being quite false positive heavy as well, wanted to pick it up after the Dlint/ReDoS thing but will prioritise this over that now. I am away until after the weekend but plan after that is to:

Double check upstream implementation, and run that against the false positive snippets from the ecosystem CI to see if I missed an implementation detail
Add false positives from the ecosystem CI to fixtures
Expand checks (agree that best heuristic is probably to check if dict assignment happens right before the loop) until false positives don't flag

Happy to wait on merging until that's sorted and we are satisfied with the precision. If we can't get it to perform like we want I'll check in again then to see what we want to do with it. I'll set the PR to draft until I get further.

qdegraaf · 2023-07-19T19:58:25Z

@charliermarsh
Had a look at the upstream implementation and it appears perflint lets anything more complex than a simple if/else slide, but does not check anything with regards to the contents of the dict. It works similarly for PERF401 and PERF402, I can open a separate PR to address that if we want, however:

For false positives based on whether the dictionary is empty. perflint flags:

def foo():
    result = {"a": 1, "b": 2}
    fruit = ["apple", "pear", "orange"]
    for idx, name in enumerate(fruit):
        if idx % 2:
            result[idx] = name

as

************* Module scratch
scratch.py:4:4: W8403: Use a dictionary comprehension instead of a for-loop (use-dict-comprehension)
scratch.py:3:12: W8301: Use tuple instead of list for a non-mutated sequence (use-tuple-over-list)

------------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 8.46/10, -1.79)

We can add the heuristic. But checking other PERF4XX we actually have similar issues, e.g.:

def f():
    items = [1, 2, 3, 4]
    result = [5, 6]
    for i in items:
        result.append(i * i)

flags:

************* Module scratch
scratch.py:4:4: W8402: Use a list copy instead of a for-loop (use-list-copy)
scratch.py:2:12: W8301: Use tuple instead of list for a non-mutated sequence (use-tuple-over-list)

------------------------------------------------------------------
Your code has been rated at 6.00/10 (previous run: 6.67/10, -0.67)

So we'd have to add the heuristic to all 4XX rules and I wouldn't exactly call it a clean fix ready to spread to multiple rules. If we don't have a way to efficiently determine whether a given list or dict is empty when checking the violation , and considering the upstream implementation is more false-positive heavy than we maybe initially thought, is it worth considering not porting the 4XX rules at all? I should have checked the coverage of the original plug-in a bit more thoroughly before implementing. Happy to add the heuristic and fix up the three rules but just want to check if it's worth the effort to add a rough heuristic in order to patch up an already rough rule.

charliermarsh · 2023-07-21T03:25:43Z

I think this is okay to flag:

def f():
    items = [1, 2, 3, 4]
    result = [5, 6]
    for i in items:
        result.append(i * i)

Because it can be expressed as result.extend(i * i for i in items).

What if the heuristic was: check if the variable was assigned to a dictionary literal prior to the loop?

qdegraaf · 2023-07-21T07:28:30Z

Ah yeah, of course, check. Lot of context switches muddying up the mind recently.

I'll implement that heuristic. Two questions:

Shall I also still open a PR to exclude if/elif/else statements from PERF401/PERF402 (and exclude them here of course) to match upstream?
Do we still want the heuristic to be in the line before the for-loop to avoid potential weirdness like:

d = {}
d[1] = 2
for x in some_other_list:
    d[x] = x

?

If so, is there an example of a rule where we do something similar, I can take inspiration from? I assume it involves scanning the parent scope for the binding and then checking that assignment. But unsure what the most efficient way is to get the relative position of that assignment (resolving the range to line number somehow?)

MichaReiser reviewed Jun 23, 2023

View reviewed changes

charliermarsh self-requested a review June 29, 2023 01:04

dhruvmanila approved these changes Jul 11, 2023

View reviewed changes

qdegraaf marked this pull request as draft July 12, 2023 08:10

qdegraaf and others added 7 commits July 19, 2023 21:26

Add PERF403

1f10c6a

cargo fmt

c565b64

Fix PR comments

2a14dcf

Rebase, merge

a244aa0

Remove foo.py

e885c5e

Revert deletion of unrecognized_version_info

2d6d489

Align names, fix rebase errors, rerun tests with new message

4d28577

qdegraaf closed this by deleting the head repository Jul 27, 2023

qdegraaf mentioned this pull request Jul 27, 2023

[perflint] Add PERF403 #6132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`perflint`] Add `PERF403` #5313

[`perflint`] Add `PERF403` #5313

qdegraaf commented Jun 22, 2023

github-actions bot commented Jun 22, 2023 •

edited

Loading

dhruvmanila Jul 11, 2023

qdegraaf Jul 11, 2023

charliermarsh commented Jul 12, 2023

qdegraaf commented Jul 12, 2023 •

edited

Loading

qdegraaf commented Jul 19, 2023 •

edited

Loading

charliermarsh commented Jul 21, 2023

qdegraaf commented Jul 21, 2023 •

edited

Loading

		for idx, name in enumerate(fruit):
		result[idx] = name # PERF403

[perflint] Add PERF403 #5313

[perflint] Add PERF403 #5313

Conversation

qdegraaf commented Jun 22, 2023

Summary

Test Plan

Issue Links

github-actions bot commented Jun 22, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

dhruvmanila Jul 11, 2023

Choose a reason for hiding this comment

qdegraaf Jul 11, 2023

Choose a reason for hiding this comment

charliermarsh commented Jul 12, 2023

qdegraaf commented Jul 12, 2023 • edited Loading

qdegraaf commented Jul 19, 2023 • edited Loading

charliermarsh commented Jul 21, 2023

qdegraaf commented Jul 21, 2023 • edited Loading

[`perflint`] Add `PERF403` #5313

[`perflint`] Add `PERF403` #5313

github-actions bot commented Jun 22, 2023 •

edited

Loading

qdegraaf commented Jul 12, 2023 •

edited

Loading

qdegraaf commented Jul 19, 2023 •

edited

Loading

qdegraaf commented Jul 21, 2023 •

edited

Loading