Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify graph in get_topological_weights. #10574

Merged
merged 13 commits into from
Nov 20, 2021
Merged

Simplify graph in get_topological_weights. #10574

merged 13 commits into from
Nov 20, 2021

Conversation

mauritsvanrees
Copy link
Contributor

Fixes #10557 where the resolver spends too much time calculating the weights.

Also, do not let get_installation_order calculate these weights at all when there is nothing left to install.

I had to change one test in test_resolver.py because some weights have changed. The test was meant to address this comment and I think the new weights with new order are okay. Translated to that comment, the new order is that B is installed before D, but that should not matter, because A is still installed first, before C, which is the point of the comment.

This is my first PR for pip.
I wonder: should this PR be done in https://github.com/sarugaku/resolvelib first?

Meanwhile, let me share the output of a few runs with extra print statements.

Installing Zope:

$ bin/python -m pip install Zope -c https://dist.plone.org/release/pip-resolver-test/constraints.txt
...
Expected node count: 80
simplified graph to 66
simplified graph to 46
simplified graph to 32
simplified graph to 25
simplified graph to 23
simplified graph to 19
simplified graph to 17
simplified graph to 14
simplified graph to 8
simplified graph to 3
simplified graph to 2
simplified graph to 1

Same with Products.CMFPlone (a much larger superset of Zope):

Expected node count: 238
simplified graph to 209
simplified graph to 169
simplified graph to 151
simplified graph to 144
simplified graph to 140
simplified graph to 127
simplified graph to 116
simplified graph to 110
simplified graph to 103
simplified graph to 95
simplified graph to 91
simplified graph to 89
simplified graph to 79
simplified graph to 70
simplified graph to 65
simplified graph to 49
simplified graph to 40
simplified graph to 36
simplified graph to 35

So 35 packages in the graph is the maximum simplification. The original visit function is called on the remaining graph.

Same with Plone (a slightly larger superset of Products.CMFPlone):

Expected node count: 253
simplified graph to 221
simplified graph to 178
simplified graph to 159
simplified graph to 152
simplified graph to 148
simplified graph to 135
simplified graph to 124
simplified graph to 118
simplified graph to 111
simplified graph to 103
simplified graph to 99
simplified graph to 97
simplified graph to 87
simplified graph to 77
simplified graph to 72
simplified graph to 55
simplified graph to 46
simplified graph to 42
simplified graph to 41

In the last case, extra packages needed to be installed, and this order was calculated:

Installing collected packages: PyJWT, certifi, urllib3, idna, chardet, requests, plone.cachepurging,
plone.rest, Products.CMFPlacefulWorkflow, plone.app.iterate, plone.restapi, plone.app.upgrade,
plone.app.caching, Plone

That seems right. For example certify is installed before requests, and Plone is installed last.

Fixes #10557 where the resolver spends too much time calculating the weights.

Also, do not let `get_installation_order` calculate these weights at all when there is nothing left to install.
@uranusjr
Copy link
Member

I wonder: should this PR be done in sarugaku/resolvelib first?

Why? This does not touch anything in resolvelib. pip/_internal/resolution/resolvelib is not a part of resolvelib, only an interface to it. The resolvelib code is vendored in pip/_vendor/resolvelib, and this PR does not touch it at all.

@mauritsvanrees
Copy link
Contributor Author

Why? This does not touch anything in resolvelib.

Ah, okay, good.

I do see lots of test failures now locally, so it seems I have some more work to do. But the idea can be reviewed.

@mauritsvanrees
Copy link
Contributor Author

Local failures were due to parallel tox runs, I think.
Mypy fixed.

I am not used to mypy and did not have pre-commit installed yet.
@mauritsvanrees
Copy link
Contributor Author

For good measure I tried with slightly older Plone constraints (https://dist.plone.org/release/5.2.5/constraints.txt) where I know there are still cyclic dependencies, and my code worked there as well. You need Python 3.8 or earlier for this.

Since this is my first PR, two GHA workflows are waiting for approval.

I am happy to squash the commits if that is preferred.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tests pass (I think they will)

After minor refactoring, the condition was the wrong way around.
@mauritsvanrees
Copy link
Contributor Author

The mypy part of the lint checking fails, see actions:

src/pip/_internal/resolution/resolvelib/resolver.py:225: error: No overload
variant of "next" matches argument types "Iterable[Optional[str]]", "None" 
[call-overload]
                if next(graph.iter_children(key), None) is None:
                   ^
src/pip/_internal/resolution/resolvelib/resolver.py:225: note: Possible overload variant:
src/pip/_internal/resolution/resolvelib/resolver.py:225: note:     def [_T, _VT] next(__i, Iterator[_T], default: _VT) -> Union[_T, _VT]
src/pip/_internal/resolution/resolvelib/resolver.py:225: note:     <1 more non-matching overload not shown>
Found 1 error in 1 file (checked 283 source files)

Any idea how to solve that? I have not worked with mypy yet. I tried without the default=None argument, adding a try/except StopIteration, but that led to a similar error No overload variant of "next" matches argument type "Iterable[Optional[str]]".

I tried adding a method to DirectedGraph in src/pip/_vendor/resolvelib/structs.py:

def is_leaf(self, key):
    return not self._forwards[key]

and then calling graph.is_leaf(key) instead of the next / iter_children line, but that just leads to a different mypy error locally:

"DirectedGraph[Optional[str]]" has no attribute "is_leaf"  [attr-defined]

Okay, I went with the slightly more verbose code that @jbylund suggested, and now mypy is happy.

Can you restart the workflow please?

@pfmoore
Copy link
Member

pfmoore commented Oct 25, 2021

FWIW, graph.iter_children(key) probably returns an Iterable, not an Iterator. But next needs an iterator - next(iter(graph.iter_children(key)), None) would probably have worked. But what you've done is fine, too.

@jbylund
Copy link
Contributor

jbylund commented Oct 25, 2021

I could see a future reader being confused about if there's somehow a copy of the graph due to the comments. My inclination would be to phrase this as an iterative (while can simplify graph) rather than the recursive, but that's just because of my perception that people find that a little easier to reason about. Curious to hear other takes on recursive v iterative here.

Copy link
Member

@DiddiLeija DiddiLeija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea looks excellent for me!

@mauritsvanrees
Copy link
Contributor Author

mauritsvanrees commented Nov 16, 2021

There seems agreement that my PR is fine. There is only the comment from @jbylund who wonders if it would be better to do simplify_graph as a while loop instead of an recursive function call. I could do that and it would look like this:

while True:
    ... call most of my simplify_graph function...
    if not leaves:
        # We are done simplifying. Recursive function has a 'return' here.
        break
    ... call the rest of my simplify_graph function...

If I do this, would it help this PR along towards merging?

Anything else I can do? Should I squash to one commit? Should I rebase on the main branch?

@uranusjr
Copy link
Member

Converting traversal to a loop is likely a good idea; it’s not terribly likely we’d go over Python’s recursing limit (Python dependencies are generally not that deep; we’re not JavaScript), but you never know.

I can’t really speak to whether that’d help get this merged though. I want more people to take a look at the logic if possible, but if no-one raises any issues this will go in the next release. Note that we don’t expect to release anything until January 2022 anyway so it doesn’t really matter that much if this is merged very soon.

@mauritsvanrees
Copy link
Contributor Author

Thanks for the info. Okay, I have turned the recursive function into a loop.
Can you restart the workflow checks?

news/10557.bugfix.rst Outdated Show resolved Hide resolved
mauritsvanrees and others added 2 commits November 17, 2021 10:49
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
@uranusjr uranusjr added this to the 22.0 milestone Nov 17, 2021
weight = len(graph) - 1
for leaf in leaves:
weights[leaf] = weight
# Remove the leaves from the copy of the graph, making the copy simpler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly misleading comment, as there is no longer a copy of the graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right. You mentioned this earlier, but I misunderstood, thinking your comment was outdated, referring to an older version of my PR.
I have fixed the comment.

Copy link
Contributor

@jbylund jbylund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you if you feel like the docstring of get_installation_order could be updated, since I don't think it's a great explanation of what's going on in the method. But looks good, thanks for the contribution.

@mauritsvanrees
Copy link
Contributor Author

Up to you if you feel like the docstring of get_installation_order could be updated, since I don't think it's a great explanation of what's going on in the method.

I have updated the docstring of get_installation_order and get_topological_weights.

But looks good, thanks for the contribution.

Thank you, and thanks for the review!

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and is super smart!

@pradyunsg pradyunsg merged commit ba979fd into pypa:main Nov 20, 2021
@pradyunsg
Copy link
Member

Thanks @mauritsvanrees! ^.^

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolver spends too much time calculating "weights" to parents
6 participants