-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Persistent caching of inference results #1145
Comments
FWIW I'm fully aware that this is no easy undertaking. However it should at least be documented as a feature request and talked about 😄 |
I'll add too, AFAICT |
Thank you for opening the issue, I agree with everything you said. I would add that making astroid performance better would save gigawatt of electricity. What I'm saying it that AWS should sponsor this issue 😄 |
In the meantime I think I'll play around with running |
I've been playing around with this in my free time and am getting close to a really good Proof-of-Concept/MvP. Hopefully by next weekend I'll have a PR to show! |
Cross-posting it here to preserve it for later discussions. -- A more useful approach IMO would be to try and cache the inference results for each file although it still needs to be investigated to what extend that is possible. Originally posted by @cdce8p in #1194 (review) |
(Perhaps the title should be edited to reflect the actual work needing done) Cross-posting my reply:
|
@joshua-cannon-techlabs said:
@cdce8p said:
This sounds like we need some quantification, which My experience (when introducing pylint-dev/pylint#3498 and #3473) was more in line with @joshua-cannon-techlabs in that there's a lot of time spent in upfront file-work, which seems to be multiplied by the number of jobs being run - which suggests that under So, the implication is that a potential fast-win is to generate the cache/AST on the main-thread and distribute that (as execplictly read-only) to the sub-procs. An even faster win might be to move the multiprocessing to the astroid loop in That said, I very much agree with @cdce8p that a simple cache isn't the be-all and end-all. The cache has to match the use-case, which in this case would be both of the following:
My AST know-how is rusty but IIRC you should be able to generate a dependency tree that can be dirtied based on changes to the tree. Then a partial solution to (1.) is that the work to be done is contingent on that dirty-state. The majority of checkers could adhere to that dirty state but some wouldn't, I imagine (edge cases etc.). You should then also get, for free, the ability to better multi-thread tasks that operate on that tree, which partially solves (2.). Considering |
Steps to reproduce
For fairly large repositories, using inference can be incredibly slow. Running
pylint
with a rule which leverages inference can take multiple seconds, while running a rule without inference is almost immediate.From profiling, most of the time taken is traversing the file's dependencies and parsing them as well. This information is cached (see
astroid.manager.AtroidManager.brain
), however the cache is in-memory and therefore doesn't last between runs. This is likely due to the fact thatastroid
classes can't be easily serialized (pickling them is unfortunately not a drop-in solution).Ideally, there is an option to enable persistent caching which is file-timestamp aware. If a particular module hasn't been changed,
astroid
could pull from the cache (exactly like Python's caching of bytecode).Of course, a lot of thought has to be put into caching, but the benefits would be on the order of dev-years if not dev-decades IMO.
Current behavior
pylint
on a single file in a large repo withassert-on-tuple
as the only rule: Maybe ~0.5s on a bad daypylint
on the same file withabstract-class-instantiated
as the only rule: Roughly 7.4spylint
twice in the same python process withabstract-class-instantiated
as the only rule: Also roughly 7.4sExpected behavior
Running
pylint
multiple times on the same file ideally doesn't take the same amount of time each time.python -c "from astroid import __pkginfo__; print(__pkginfo__.version)"
output2.4.2
The text was updated successfully, but these errors were encountered: