DO NOT MERGE - WIP: Add ability to serialize/deserialze astroid.Module (and friends) #1194

thejcannon · 2021-10-01T19:50:11Z

Steps

For new features or bug fixes, add a ChangeLog entry describing what your PR does.
Write a good description on what the PR does.

Description

This PR adds the ability to serialize and deserialize astroid.Module instances (and by association, every other kind of instance it may reference) to-and-from JSON.

Why persistence?

Persisting processed modules is useful primarily when parsing a large number of files, or files with a lot of interdependencies. So long as desrializing form the cache is quicker than parsing the files, it's a net win.

(I'll do some profiling with pylint on our codebase of about 1k files when this is more polished)

Why JSON, why not YAML, XML, or `pickle`?

JSON was chosen as it is compact (as opposed to yml or xml or similar) and we can choose what and how to store information. Although pickling might be simpler in terms of complexity, it is inherently insecure as a bad actor only needs to infect the astroid pickled data on disk to infect the next parse. Using JSON, since we control the data, the actor would have to infect the cache, and the Python environment the next parse runs in.

And since we aren't inventing anything new here, it should also be stated mypy's cache is also JSON 😉

Changes

The bulk of the relevant changes have been localized to the _persistence module, which has been specially implemented in order to reduce the amount of changes to all of the relevant astroid classes. Additional changes have been made in order to either serialize/deserialize in a special way or to enable the default way.

A future change will have the changes to the module cache to leverage this new feature.

Testing

TODO

Type of Changes

	Type
	🐛 Bug fix
✓	✨ New feature
	🔨 Refactoring
	📜 Docs

Related Issue

Related #1145

for more information, see https://pre-commit.ci

thejcannon

Not all tests pass right now. Latest errors are from inference not working (and since inference is quite the beasty, I thought I'd publish this PR as-is for initial feedback)

thejcannon · 2021-10-01T19:52:29Z

tests/conftest.py

+
+def pytest_addoption(parser):
+    parser.addoption(
+        "--test-roundtrip-persistence", type=bool, default=True,


Eventually, I think we'll add a tox section to re-run all tests with this enabled. For now it's just enabled so I can test easily.

thejcannon · 2021-10-01T21:54:02Z

I'll use this PR to update progress and dump thoughts as I work through the remaining problems.

The current test is failing AFAICT due to _explicit_inference not persisting after serialization. I assume this should come from the
inference_tip cache? 🤔

…nto persistence

for more information, see https://pre-commit.ci

Pierre-Sassoulas

Thank you for opening this. Sorry I don't have time for an in depth review right now. I agree with the need to improve the caching though. Could you be more specific about the advantages of JSON compared to pickle and about what make the persistance necessary ? Is it for very big code base or low ram machine where the cache would be bigger than ram ? Also could you add typing on newly added functions, please ?

astroid/nodes/scoped_nodes.py

Pierre-Sassoulas · 2021-10-03T06:08:20Z

astroid/nodes/scoped_nodes.py

@@ -3045,3 +3103,12 @@ def _get_assign_nodes(self):
            child_node._get_assign_nodes() for child_node in self.body
        )
        return list(itertools.chain.from_iterable(children_assign_nodes))
+
+    def __dump__(self, dumper):
+        data = super().__dump__(dumper)


Is this a cultural reference to dumb and dumber ;) ?

Haha no, but we can pretend it is.
I originally was passing the refmap around but thought that was kind of dirty, so wanted to pass a function that did the dumping and remembered the remap, so "dumper" was the first word that came to mind 😂

…nto persistence

thejcannon · 2021-10-20T21:18:40Z

@Pierre-Sassoulas I got tests to pass. I think the next step is I'll break out this monolothic change into smaller ones.
Most of the PRs will be cleanup/refactoring, then we can make this one the "persistence" PR

for more information, see https://pre-commit.ci

thejcannon · 2021-10-20T21:28:15Z

#1215
#1216

…nto persistence

cdce8p

I'm not convinced we'll see any noticeable improvement with this approach. As far as I'm aware most time is spend during the actual inference. Parsing the ast tree by contrast is fairly quick. So much so that I might bet we won't see measurable changes when constructing it from cache instead.

A more useful approach IMO would be to try and cache the inference results for each file although it still needs to be investigated to what extend that is possible.

thejcannon · 2021-11-06T14:44:16Z

I agree the AST parsing caching itself wouldn't be sufficient (hence why there isn't any code that actually persists the AST to disk yet). I assumed this would be the first step that would eventually lead to inference caching (since the inference results would need to be persistable).

Closing, though as I won't have much free time anymore to walk this along. It can be used as a starting point for that future effort though 🎉

thejcannon and others added 8 commits September 26, 2021 12:39

Initial Implementation

7b85297

Fix test

c051201

conftest

be76973

Let sget it started

3d244e9

More bug fixes

bfc7d30

Dumb bus

cddb15a

comment

051f57d

[pre-commit.ci] auto fixes from pre-commit.com hooks

a93ffe6

for more information, see https://pre-commit.ci

thejcannon commented Oct 1, 2021

View reviewed changes

thejcannon and others added 5 commits October 2, 2021 06:59

Less bugs

e8f3516

typos

b449f69

Lookin good

fe90f38

Merge branch 'persistence' of https://github.com/thejcannon/astroid i…

2d4989b

…nto persistence

[pre-commit.ci] auto fixes from pre-commit.com hooks

c7a4baa

for more information, see https://pre-commit.ci

Pierre-Sassoulas reviewed Oct 3, 2021

View reviewed changes

thejcannon changed the title ~~Persistence~~ DO NOT MERGE - WIP: Persistence Oct 3, 2021

cdce8p marked this pull request as draft October 3, 2021 12:14

thejcannon changed the title ~~DO NOT MERGE - WIP: Persistence~~ DO NOT MERGE - WIP: Add ability to serialize/deserialze astroid.Module (and friends) Oct 3, 2021

thejcannon added 2 commits October 3, 2021 13:18

move to rebuilder

58cbd5b

Merge branch 'persistence' of https://github.com/thejcannon/astroid i…

8762433

…nto persistence

Pierre-Sassoulas added the Work in progress label Oct 10, 2021

thejcannon added 4 commits October 18, 2021 15:57

checkin

28e3319

only enums

a93f61c

tests pass!

9cac56e

add new test

90c3417

thejcannon force-pushed the persistence branch from 2232b52 to 90c3417 Compare October 20, 2021 21:18

thejcannon added 2 commits October 20, 2021 16:21

no debug

867ec8f

Merge branch 'main' into persistence

b7071d9

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a62311

for more information, see https://pre-commit.ci

thejcannon added 2 commits October 20, 2021 16:29

nodebug

fc58ec6

Merge branch 'persistence' of https://github.com/thejcannon/astroid i…

b9da2e5

…nto persistence

This was referenced Oct 20, 2021

Add test for __members__ #1216

Merged

Ensure the *_fields attributes of __init__/postinit are cohesive #1218

Closed

cdce8p reviewed Oct 24, 2021

View reviewed changes

thejcannon closed this Nov 4, 2021

thejcannon deleted the persistence branch November 4, 2021 21:37

thejcannon restored the persistence branch November 4, 2021 21:37

cdce8p mentioned this pull request Nov 6, 2021

Feature: Persistent caching of inference results #1145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE - WIP: Add ability to serialize/deserialze astroid.Module (and friends) #1194

DO NOT MERGE - WIP: Add ability to serialize/deserialze astroid.Module (and friends) #1194

thejcannon commented Oct 1, 2021 •

edited

Loading

thejcannon left a comment

thejcannon Oct 1, 2021

thejcannon commented Oct 1, 2021

Pierre-Sassoulas left a comment

Pierre-Sassoulas Oct 3, 2021

thejcannon Oct 3, 2021 •

edited

Loading

thejcannon commented Oct 20, 2021

thejcannon commented Oct 20, 2021 •

edited

Loading

cdce8p left a comment

thejcannon commented Nov 6, 2021

DO NOT MERGE - WIP: Add ability to serialize/deserialze astroid.Module (and friends) #1194

DO NOT MERGE - WIP: Add ability to serialize/deserialze astroid.Module (and friends) #1194

Conversation

thejcannon commented Oct 1, 2021 • edited Loading

Steps

Description

Why persistence?

Why JSON, why not YAML, XML, or pickle?

Changes

Testing

Type of Changes

Related Issue

thejcannon left a comment

Choose a reason for hiding this comment

thejcannon Oct 1, 2021

Choose a reason for hiding this comment

thejcannon commented Oct 1, 2021

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

Pierre-Sassoulas Oct 3, 2021

Choose a reason for hiding this comment

thejcannon Oct 3, 2021 • edited Loading

Choose a reason for hiding this comment

thejcannon commented Oct 20, 2021

thejcannon commented Oct 20, 2021 • edited Loading

cdce8p left a comment

Choose a reason for hiding this comment

thejcannon commented Nov 6, 2021

thejcannon commented Oct 1, 2021 •

edited

Loading

Why JSON, why not YAML, XML, or `pickle`?

thejcannon Oct 3, 2021 •

edited

Loading

thejcannon commented Oct 20, 2021 •

edited

Loading