Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use --additional-flags='check-untyped-defs' when running mypy_primer #9433

Merged
merged 1 commit into from
Dec 31, 2022

Conversation

AlexWaygood
Copy link
Member

This was suggested by @Avasam in hauntsaninja/mypy_primer#64, and by @Akuli in hauntsaninja/mypy_primer#66. It could conceivably yield much more interesting diffs for some PRs. For example, comtypes makes heavy use of pywin32, so it shows up a lot in primer diffs for PRs affecting our pywin32 stubs. But the diffs usually aren't very interesting, because the vast majority of comtypes's functions are untyped, so mypy just ignores them.

I think this is worth experimenting with, and if it doesn't work for whatever reason, we can always revert it.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@JelleZijlstra JelleZijlstra merged commit f794cdd into main Dec 31, 2022
@JelleZijlstra JelleZijlstra deleted the primer-workflow branch December 31, 2022 19:01
@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 1, 2023

Definitely worth experimenting with! I'm not sure it'll do too much for us over what stubtest can do, since most of the things in untyped functions will be Any.

@AlexWaygood
Copy link
Member Author

It looks like this might come at the cost of a performance regression (which is understandable, since we're asking mypy to check a significantly larger quantity of code in some cases) — it took 15 minutes for the four primer jobs to complete on #9442, whereas we usually got the primer comment in 8-9 minutes prior to this PR being merged.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Jan 2, 2023

Three options I can think of:

  1. Accept that primer will take a little longer now, and leave things be. Mypy 1.0 will have a number of optimisations that may partially mitigate this on their own anyway.
  2. Think of ways we can mitigate the performance regression (more sharding? Mypy uses 5 shards in CI whereas we only use 4 here).
  3. Revert the PR.

@AlexWaygood
Copy link
Member Author

Another 15-minute primer run: https://github.com/python/typeshed/actions/runs/3824023959

@Avasam
Copy link
Sponsor Collaborator

Avasam commented Jan 2, 2023

That's definitely a significant increase in time spent.

  • mypy_primer is bound to keep growing with more projects, so sooner or later more sharding might be beneficial. (although your tests shows in this case that's not a direct fix).
  • I'd love to the the difference with 1.0 of course. Could we have a run (maybe locally) with mypy master to compare?
  • Is the time increase consistent per project? Like, would redistributing the weighs with check-untyped-defs in mind help at all?
    • Here are the biggest offenders from the run shared (everything over 2 minutes), in order:
      • manticore took 395.61s + 397.76s
      • pandas took 155.67s + 157.83s
      • sympy took 127.22s + 128.01s
      • core took 92.67s + 93.75s
      • graphql-core took 80.45s + 79.91s
      • spark took 58.58s + 57.86s
    • Splitting off just manticore itself in its own shard would re-balance everything else
      • Manticore's checked code and weigh changes drastically if you're checking for untyped defs. (I assume a large part of the codebase is untyped)
      • At the cost of possible extra maintenance pain, weighs could be with and without untyped-defs.
    • Could mypy_primer self-report the run-time of each project in a way that's easy to copy to make weighs (new projects and changes in them) more maintainable? (or even semi-automated).
      • I'm spitballing here, but weighs could be cached so that custom primer runs and subsequent runs can automatically adjusts (a CI would have to save the cache as an artefact).

@AlexWaygood
Copy link
Member Author

  • I'd love to the the difference with 1.0 of course. Could we have a run (maybe locally) with mypy master to compare?

It's easy to point mypy_primer at the mypy master branch, but a little trickier to get a good performance comparison, since you need a mypyc-compiled version of mypy in order to get accurate numbers. Having said that, it looks like compiled wheels are regularly uploaded here, though I've never tried using them: https://github.com/mypyc/mypy_mypyc-wheels.

It's also worth noting that, while there have been a lot of optimisations recently merged, the single most significant potential optimisation is still in the pipeline: python/mypy#14150.

  • Manticore's checked code and weigh changes drastically if you're checking for untyped defs. (I assume a large part of the codebase is untyped)

I briefly glanced at the source code, and that indeed seems to be the case

At the cost of possible extra maintenance pain, weighs could be with and without untyped-defs.

(I'm not a mypy_primer maintainer, so this is just my 2 cents): Another option that might be less of a maintenance burden might be to add a --skip-project option. That might also mean that we could run mypy_primer on 3.11, if we think it's important to do so (we'd just need to skip Materialize: #9336).

@Avasam
Copy link
Sponsor Collaborator

Avasam commented Jan 2, 2023

add a --skip-project option

project selection:
  -k PROJECT_SELECTOR, --project-selector PROJECT_SELECTOR
                        regex to filter projects (matches against location)

I haven't tried, but I feel it should be possible to do that already with the regex route.
Something like -k ^(?!https://github.com/MaterializeInc/materialize$) ?
(or -k ^(?!https://github.com/MaterializeInc/materialize).+?$ if primer doesn't like the match)

@AlexWaygood
Copy link
Member Author

add a --skip-project option

project selection:
  -k PROJECT_SELECTOR, --project-selector PROJECT_SELECTOR
                        regex to filter projects (matches against location)

I haven't tried, but I feel it should be possible to do that already with the regex route. Something like -k ^(?!https://github.com/MaterializeInc/materialize$) ? (or -k ^(?!https://github.com/MaterializeInc/materialize).+?$ if primer doesn't like the match)

Ah, serves me right for not reading the docs closely enough :)

In that case, it would be interesting to see if we can get back the speed increase just by skipping manticore.

Though as @hauntsaninja says, it still remains to be seen whether using --check-untyped-defs will actually give us significantly better results on any PRs. So the best course of action might still be just to revert this PR.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 3, 2023

I think manticore has been fairly helpful in typeshed CI: https://github.com/python/typeshed/pulls?q=is%3Apr+manticore (PR count isn't super trustworthy, but it's indicative). There's a significant chance that having manticore in CI will be more helpful than having --check-untyped-defs.

That said, I took a close look, and should be easy to fix the perf issue that manticore is running into.

For more forward looking things, I'm disinclined to pursue more complicated sharding strategies than the one I recently implemented. I'm happy to pursue better project selection options for mypy_primer if it makes some typeshed use case happy, although I'm hoping not to grow mypy_primer's project list enough that this becomes necessary.

@JelleZijlstra
Copy link
Member

I wonder if it's feasible to limit mypy-primer runs for third-party stub changes to only projects that use the library, either directly or indirectly. Perhaps we can somehow get the transitive dependencies of each mypy-primer project using setup.py/PyPI/whatever, then check if the stub package we're changing is in that set. If not, we can skip type-checking the project in mypy-primer.

@Avasam
Copy link
Sponsor Collaborator

Avasam commented Jan 3, 2023

I like Jelle's idea in theory. It should be fully doable within typeshed too. And would reduce typeshed's already long mypy_primer runs. If its feasible, it'd be useful even outside this PR.

I just don't know how reliable "get the transitive dependencies of each mypy-primer project using setup.py/PyPI/whatever" is. And the added overhead of doing that (even with some sort of caching).

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jan 3, 2023

Yeah, this has been discussed before. It's a good idea, but there are a number of finicky details to consider, and it wouldn't help PRs targeting stdlib (which last time I checked was still a majority of typeshed PRs). Detecting dependencies also has the advantage that we could spit out a relevant project count, which may help calibrate reviewers for how meaningful a "no changes found" is.

@Avasam
Copy link
Sponsor Collaborator

Avasam commented Jan 3, 2023

Then maybe this idea deserves its own issue and discussion thread.

@AlexWaygood
Copy link
Member Author

Given that mypy 1.0 may still be a few weeks away from being released, here's a PR to revert in the meantime:

We can take another look when mypy 1.0 comes out and see how the performance compares.

AlexWaygood added a commit that referenced this pull request Jan 3, 2023
…py_primer" (#9451)

Revert "Use `--additional-flags='check-untyped-defs'` when running mypy_primer (#9433)"

This reverts commit f794cdd.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants