Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perflint] Add PERF403 #5313

Closed
wants to merge 7 commits into from
Closed

[perflint] Add PERF403 #5313

wants to merge 7 commits into from

Conversation

qdegraaf
Copy link
Contributor

Summary

Adds PERF403 mirroring W8403 from https://github.com/tonybaloney/perflint

Implementation is similar to original/upstream implementation, with the exception of not yet checking whether the ExprSubscript value is a dictionary type value. I tried adding this but ended up doing a lot of

let scope = checker.semantic().scope();
if let Some(binding_id) = scope.get(subscript_value_id) {
    let binding = &checker.semantic().bindings[binding_id];
    if binding.kind.is_assignment() || binding.kind.is_named_expr_assignment() {
        if let Some(parent_id) = binding.source {
        etc....

again for what felt like very little added value.

Open to any feedback/input to do this more efficiently. If there is no more efficient way to do this, also let me know and I'll see if I can make some kind of helper or util for this, as I've bumped into this a few times now with various rules.

I've set the violation to only flag the first append call in a long if-else statement. Happy to change this to some other location or make it multiple violations if that makes more sense.

Test Plan

Fixtures were added based on perflint tests

Issue Links

Refers: #4789

@github-actions
Copy link
Contributor

github-actions bot commented Jun 22, 2023

PR Check Results

Ecosystem

ℹ️ ecosystem check detected changes. (+21, -0, 0 error(s))

airflow (+13, -0)

+ airflow/dag_processing/manager.py:997:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/executors/kubernetes_executor_utils.py:137:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/exasol/hooks/exasol.py:66:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/google/cloud/hooks/bigquery.py:3063:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/google/cloud/transfers/gcs_to_bigquery.py:708:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/mysql/hooks/mysql.py:163:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/postgres/hooks/postgres.py:149:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/providers/smtp/hooks/smtp.py:282:17: PERF403 Use a dictionary comprehension instead of a for-loop
+ airflow/utils/email.py:215:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ dev/breeze/src/airflow_breeze/utils/run_utils.py:185:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ setup.py:524:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ tests/test_utils/config.py:57:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ tests/test_utils/config.py:82:13: PERF403 Use a dictionary comprehension instead of a for-loop

bokeh (+3, -0)

+ src/bokeh/core/has_props.py:128:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ src/bokeh/core/property/wrappers.py:517:21: PERF403 Use a dictionary comprehension instead of a for-loop
+ src/bokeh/models/sources.py:283:13: PERF403 Use a dictionary comprehension instead of a for-loop

zulip (+5, -0)

+ analytics/views/stats.py:456:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ analytics/views/stats.py:531:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ zerver/lib/response.py:128:9: PERF403 Use a dictionary comprehension instead of a for-loop
+ zerver/views/auth.py:371:13: PERF403 Use a dictionary comprehension instead of a for-loop
+ zerver/views/development/registration.py:28:9: PERF403 Use a dictionary comprehension instead of a for-loop

Rules changed: 1
Rule Changes Additions Removals
PERF403 21 21 0

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     11.0±0.02ms     3.7 MB/sec    1.02     11.2±0.04ms     3.6 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.2±0.00ms     7.5 MB/sec    1.00      2.2±0.00ms     7.5 MB/sec
formatter/numpy/globals.py                 1.00    247.0±0.31µs    11.9 MB/sec    1.07   264.3±12.88µs    11.2 MB/sec
formatter/pydantic/types.py                1.00      4.8±0.01ms     5.4 MB/sec    1.02      4.8±0.01ms     5.3 MB/sec
linter/all-rules/large/dataset.py          1.00     15.2±0.03ms     2.7 MB/sec    1.02     15.5±0.02ms     2.6 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.9±0.00ms     4.3 MB/sec    1.01      3.9±0.01ms     4.2 MB/sec
linter/all-rules/numpy/globals.py          1.00    510.7±0.68µs     5.8 MB/sec    1.01    516.2±3.26µs     5.7 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.03ms     3.7 MB/sec    1.01      7.0±0.01ms     3.7 MB/sec
linter/default-rules/large/dataset.py      1.00      7.8±0.02ms     5.2 MB/sec    1.00      7.8±0.02ms     5.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.01   1676.4±6.23µs     9.9 MB/sec    1.00   1665.9±2.47µs    10.0 MB/sec
linter/default-rules/numpy/globals.py      1.00    187.1±0.29µs    15.8 MB/sec    1.00    187.1±2.36µs    15.8 MB/sec
linter/default-rules/pydantic/types.py     1.02      3.6±0.01ms     7.2 MB/sec    1.00      3.5±0.00ms     7.3 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.8±0.08ms     3.8 MB/sec    1.01     10.9±0.07ms     3.7 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.2±0.03ms     7.7 MB/sec    1.00      2.2±0.04ms     7.7 MB/sec
formatter/numpy/globals.py                 1.00    242.9±4.10µs    12.1 MB/sec    1.02   248.8±11.57µs    11.9 MB/sec
formatter/pydantic/types.py                1.00      4.7±0.06ms     5.5 MB/sec    1.01      4.7±0.05ms     5.4 MB/sec
linter/all-rules/large/dataset.py          1.00     15.2±0.09ms     2.7 MB/sec    1.00     15.2±0.21ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      4.0±0.04ms     4.2 MB/sec    1.00      4.0±0.04ms     4.2 MB/sec
linter/all-rules/numpy/globals.py          1.00    482.5±7.00µs     6.1 MB/sec    1.01    486.3±6.21µs     6.1 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.05ms     3.7 MB/sec    1.00      6.9±0.10ms     3.7 MB/sec
linter/default-rules/large/dataset.py      1.00      7.9±0.05ms     5.1 MB/sec    1.00      7.9±0.09ms     5.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1654.6±21.59µs    10.1 MB/sec    1.00  1654.6±14.50µs    10.1 MB/sec
linter/default-rules/numpy/globals.py      1.00    187.9±2.47µs    15.7 MB/sec    1.01    189.6±3.71µs    15.6 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.5±0.03ms     7.2 MB/sec    1.00      3.5±0.03ms     7.2 MB/sec

@charliermarsh charliermarsh self-requested a review June 29, 2023 01:04
Comment on lines +4 to +5
for idx, name in enumerate(fruit):
result[idx] = name # PERF403
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wonder similar to PERF402 why there isn't the separation for dictionary, I'm assuming the corresponding rule for dictionary doesn't exists upstream as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah don't know why it's split into copy() and comprehension for lists but only one for dicts. My implementation was just a straight copy from https://github.com/tonybaloney/perflint/blob/main/perflint/comprehension_checker.py

As we are already deviating with the if conditional checks, we could add a separate rule PERF404 if that has value.

@charliermarsh
Copy link
Member

The delay on this one is that I'm worried about false positives. Most of the examples surfaced in the ecosystem CI checks seem to be false positives. I don't think we have a great way to solve this right now. The best we can do, probably, is check if the variable to which we're assigning was assigned the empty dictionary immediately prior to the loop. That's probably a necessary precondition to merging here just based on what I see in the ecosystem CI -- wdyt, @qdegraaf?

@qdegraaf
Copy link
Contributor Author

qdegraaf commented Jul 12, 2023

The delay on this one is that I'm worried about false positives. Most of the examples surfaced in the ecosystem CI checks seem to be false positives. I don't think we have a great way to solve this right now. The best we can do, probably, is check if the variable to which we're assigning was assigned the empty dictionary immediately prior to the loop. That's probably a necessary precondition to merging here just based on what I see in the ecosystem CI -- wdyt, @qdegraaf?

@charliermarsh it's been on my TODO list to look at this one again since I got wind of PERF401 and PERF402 being quite false positive heavy as well, wanted to pick it up after the Dlint/ReDoS thing but will prioritise this over that now. I am away until after the weekend but plan after that is to:

  • Double check upstream implementation, and run that against the false positive snippets from the ecosystem CI to see if I missed an implementation detail
  • Add false positives from the ecosystem CI to fixtures
  • Expand checks (agree that best heuristic is probably to check if dict assignment happens right before the loop) until false positives don't flag

Happy to wait on merging until that's sorted and we are satisfied with the precision. If we can't get it to perform like we want I'll check in again then to see what we want to do with it. I'll set the PR to draft until I get further.

@qdegraaf qdegraaf marked this pull request as draft July 12, 2023 08:10
@qdegraaf
Copy link
Contributor Author

qdegraaf commented Jul 19, 2023

@charliermarsh
Had a look at the upstream implementation and it appears perflint lets anything more complex than a simple if/else slide, but does not check anything with regards to the contents of the dict. It works similarly for PERF401 and PERF402, I can open a separate PR to address that if we want, however:

For false positives based on whether the dictionary is empty. perflint flags:

def foo():
    result = {"a": 1, "b": 2}
    fruit = ["apple", "pear", "orange"]
    for idx, name in enumerate(fruit):
        if idx % 2:
            result[idx] = name

as

************* Module scratch
scratch.py:4:4: W8403: Use a dictionary comprehension instead of a for-loop (use-dict-comprehension)
scratch.py:3:12: W8301: Use tuple instead of list for a non-mutated sequence (use-tuple-over-list)

------------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 8.46/10, -1.79)

We can add the heuristic. But checking other PERF4XX we actually have similar issues, e.g.:

def f():
    items = [1, 2, 3, 4]
    result = [5, 6]
    for i in items:
        result.append(i * i) 

flags:

************* Module scratch
scratch.py:4:4: W8402: Use a list copy instead of a for-loop (use-list-copy)
scratch.py:2:12: W8301: Use tuple instead of list for a non-mutated sequence (use-tuple-over-list)

------------------------------------------------------------------
Your code has been rated at 6.00/10 (previous run: 6.67/10, -0.67)

So we'd have to add the heuristic to all 4XX rules and I wouldn't exactly call it a clean fix ready to spread to multiple rules. If we don't have a way to efficiently determine whether a given list or dict is empty when checking the violation , and considering the upstream implementation is more false-positive heavy than we maybe initially thought, is it worth considering not porting the 4XX rules at all? I should have checked the coverage of the original plug-in a bit more thoroughly before implementing. Happy to add the heuristic and fix up the three rules but just want to check if it's worth the effort to add a rough heuristic in order to patch up an already rough rule.

@charliermarsh
Copy link
Member

I think this is okay to flag:

def f():
    items = [1, 2, 3, 4]
    result = [5, 6]
    for i in items:
        result.append(i * i) 

Because it can be expressed as result.extend(i * i for i in items).

What if the heuristic was: check if the variable was assigned to a dictionary literal prior to the loop?

@qdegraaf
Copy link
Contributor Author

qdegraaf commented Jul 21, 2023

Ah yeah, of course, check. Lot of context switches muddying up the mind recently.

I'll implement that heuristic. Two questions:

  • Shall I also still open a PR to exclude if/elif/else statements from PERF401/PERF402 (and exclude them here of course) to match upstream?
  • Do we still want the heuristic to be in the line before the for-loop to avoid potential weirdness like:
d = {}
d[1] = 2
for x in some_other_list:
    d[x] = x

?

If so, is there an example of a rule where we do something similar, I can take inspiration from? I assume it involves scanning the parent scope for the binding and then checking that assignment. But unsure what the most efficient way is to get the relative position of that assignment (resolving the range to line number somehow?)

@qdegraaf qdegraaf closed this by deleting the head repository Jul 27, 2023
@qdegraaf qdegraaf mentioned this pull request Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants