Addition of new graph traversal tools #3686

ramirezfranciscof · 2019-12-18T14:08:28Z

This PR incorporates two new tools for graph traversal:

The AiiDA Graph Explorer (AGE) package of classes: this is a general tool that allows to do complex graph traversal operations in an AiiDA 'expanded graph' (in which both the AiiDA nodes and groups are considered to be graph nodes) by defining 'rules' from generic querybuilder instances. Then, given an initial set of nodes (that have to be set inside generic 'baskets'), these queries will be successively applied on top of the results of the previous one, repeating the whole cycle the desired number of times (which could be 'until no new nodes are added').
The traverse_graph function: this is a simplified interface to use the AGE to search for AiiDA nodes and links using a reduced set of customizable rules. Contrary to the AGE, which retains basically the same versatility of the querybuilder and thus allows for very complex traversals, with traverse_graph you can only specify which type of links will be allowed to be traversed and in which direction, and these are all traversed in each iteration. The advantage is that one no longer needs to manually define and handle baskets and querybuilders.

The function traverse_graph (which uses AGE as its search engine) is now used by the delete, the export and the graph visualization procedures. For the first one, the delete_nodes function now uses this function instead of doing its own search but the procedure is similar to how it was performed before. On the other hand, for the export procedure every node added to the set was queried separately for ancestors and descendants, whereas now the whole sets of new nodes found are queried for more nodes (all of this happens inside the retrieve_linked_nodes function). Finally, the graph class methods recurse_descendants and recurse_ancestors used to work by calling the add_incoming and add_outgoing many times, inside of which nodes were loaded to get their pks. Now these are all independent and each has its own call to traverse_graph, where pks are obtained directly from the query projection.

This closes #3331

ramirezfranciscof · 2019-12-18T14:13:47Z

(I specifically tagged @CasperWA because he had requested to review this once it was finished, but all reviews are most welcome)

ltalirz

thanks @ramirezfranciscof for pushing forward with this!
I've had a quick read through a couple of files, some minor comments here and there.

Perhaps @chrisjsewell would like to give a brief look at what is happening in graph.py ?

aiida/common/links.py

aiida/tools/graph/age_entities.py

aiida/tools/importexport/dbexport/utils.py

aiida/tools/graph/graph_traversers.py

aiida/tools/graph/age_rules.py

chrisjsewell · 2019-12-18T18:43:58Z

Perhaps @chrisjsewell would like to give a brief look at what is happening in graph.py ?

For the graph visualisation, obviously the principle thing is that test_graph.py still passes. But also you need to check whether visualising_graphs.ipynb still runs or requires any changes.
If there are any changes, these also need to be translated to visualising_graphs.rst.

zooks97 · 2019-12-18T19:58:32Z

Perhaps @chrisjsewell would like to give a brief look at what is happening in graph.py ?

For the graph visualisation, obviously the principle thing is that test_graph.py still passes. But also you need to check whether visualising_graphs.ipynb still runs or requires any changes.
If there are any changes, these also need to be translated to visualising_graphs.rst.

All of the tests in test_graph.py do pass, but I have not tested visualising_graphs.ipynb. I purposefully did not change the interface for the Graph class or any of its methods to maintain full backward compatibility (except that providing a print function does not print the graph nodes and links as they are traversed), so I expect that there should be no issues with this notebook or the documentation that it produces.

ltalirz · 2019-12-19T07:49:14Z

@ramirezfranciscof Are you working on updating the branch? Do you need help?
Moving forward with this would be quite important since it addresses an major usability issue

ramirezfranciscof · 2019-12-19T09:05:23Z

@ltalirz Yes, I will now apply the requested changes. If it is ok with you, I will make modify the respective existing commits with the corresponding change, so I will force push the modifications.

ramirezfranciscof · 2019-12-19T11:40:20Z

@ltalirz I applied all corrections except for the comment in the links (waiting for the ok) and the export_tree that you said should be a different PR (and also I didn't really got what you meant). Please check, specially if the keyset property (to replace get_keys) was done correctly (I would have liked to use self.keyset = None in the __init__ but pylint complains if _keyset is not initialized explicitly there =S).

ramirezfranciscof · 2019-12-19T11:46:11Z

Ok, docs are now failing because it doesn't recognize a type (make html worked locally, but I didn't check doing just make). How can I specify in the type docstring that something can have several types and/or that it can also be None?

greschd · 2019-12-19T11:52:34Z

How can I specify in the type docstring that something can have several types and/or that it can also be None?

I think you can use the typing module: typing.Union[str, int] means "str or int", and typing.Optional[str] for "str or None".

ramirezfranciscof · 2019-12-19T11:57:22Z

@greschd But you mean in the docstring? Like this?

"""
(...)
    :type max_iterations:  typing.Optional[int]
    :param max_iterations:
        The number of iterations to apply the set of rules (a value of 'None' will
        iterate until no new nodes are added).
(...)
"""

Because this is not working for me, its just rendering literally 'typing.Optional[int]'

greschd · 2019-12-19T12:36:16Z

Ah, no.. apparently that does not work 🙄

The following works, though:

:type max_iterations:  int or None

If we start adding type hints, we can consider the following sphinx plugin to avoid duplicating the info: https://pypi.org/project/sphinx-autodoc-typehints/

ramirezfranciscof · 2019-12-19T13:46:03Z

Thanks @greschd ! That seemed to have worked.

aiida/tools/graph/graph_traversers.py

ltalirz

Thanks @ramirezfranciscof

@ltalirz I applied all corrections except for the comment in the links (waiting for the ok)

Made suggestion here

Please check, specially if the keyset property (to replace get_keys) was done correctly

Looks fine.

(I would have liked to use self.keyset = None in the init but pylint complains if _keyset is not initialized explicitly there =S).

Also this is correct - the constructor should initialize the underlying variable.

@sphuber @CasperWA From my side this looks good and brings enormous speedup to the export.
Who still wants to have a look before this goes in?

sphuber · 2019-12-19T14:44:42Z

Thanks guys, I would like to give it a look still and will do so today.

ltalirz · 2019-12-19T14:46:46Z

Just for reference, here again the results from the test I did.

Test set: Exporting 67k groups with 5 nodes each (which expands to 1.4M nodes with provenance)

"Old implementation"[1]: ~450 minutes
AGE implementation: 6 minutes

[1] This is on a modified version of the progress bar PR, where I already removed slow-down from the progress bar.
It took 5h 15min to retrieve the linked nodes for 70% of the nodes, then I canceled. I extrapolated from there for the total time.

ramirezfranciscof · 2019-12-19T14:54:21Z

(I would have liked to use self.keyset = None in the init but pylint complains if _keyset is not initialized explicitly there =S).

Also this is correct - the constructor should initialize the underlying variable.

I'm a little confused by this. Part of the idea of the setter is to avoid duplication of checks, parsings, etc correct? The typical example being a property Temp and checking that it is being set to some value > -273. If the __init__ has to set _Temp itself, then I have to repeat the check in there. Sure, in the case this is just hardcoded to a value, you know what to use, but what if the initializer takes an input for this? You either have to make the checks yourself or initialize _Temp to a random valid value just to appease pylint (or more generally to appease the guideline) and then call the setter to put the real value? I don't know, somehow it feels cleaner if the hidden attribute _Temp could only appear inside the property methods.

sphuber · 2019-12-19T15:00:42Z

I'm a little confused by this. Part of the idea of the setter is to avoid duplication of checks, parsings, etc correct? The typical example being a property Temp and checking that it is being set to some value > -273. If the __init__ has to set _Temp itself, then I have to repeat the check in there. Sure, in the case this is just hardcoded to a value, you know what to use, but what if the initializer takes an input for this?

This may be just a bug in pylint. If you are initializing a class instance attribute through a setter property in the constructor then that is perfectly fine and indeed the right practice. You probably mean that pylint about something like "warning: class attribute initialized outside-constructor" or something to that effect? In that case it is simply a false positive and I think that pylint does not realize that you are in fact initializing it.

ltalirz · 2019-12-19T15:05:05Z

@ramirezfranciscof I think you make a valid point.
What I meant to say was: It is good practice to explicitly initialize instance variables in the constructor, since this is easy to read and you are less likely to run into cases where the variable is not defined.

I agree that there are also cases, where it makes sense to use the setter already for initialization, in particular if you were passing along arguments of the constructor to the variable.

Here, you are setting it to None, however, so I think both solutions are fine.

ramirezfranciscof · 2019-12-19T15:07:02Z

This may be just a bug in pylint. If you are initializing a class instance attribute through a setter property in the constructor then that is perfectly fine and indeed the right practice. You probably mean that pylint about something like "warning: class attribute initialized outside-constructor" or something to that effect? In that case it is simply a false positive and I think that pylint does not realize that you are in fact initializing it.

Yes, this is indeed the case. All I found regarding this problem was this issue in the pylint github.

sphuber

My compliments to the AGE team, this is some great code! This should really clean a lot of the code base and as @ltalirz has noticed make it a lot faster too. I still have some minor requests but these should be relatively easy to address

aiida/tools/visualization/graph.py

aiida/tools/graph/age_entities.py

CasperWA

I tried to submit my review - and GitHub failed with a Unicorn! ...

It seems it has saved all my review changes, but this message was not cached, so I'll try again.... hmm...

Very good job on this massive PR!

I have been quite meticulous in my comments, considering changes to comments and variable names and more ... Don't worry about it, but do try and consider my comments fairly.

Be careful not to add Python2-specific code (e.g., imports from __future__).
Be aware that pylint is only trying to help, not annoy - although it definitely can be annoying. It is only trying to suggest that your code may be better - which is not always the case, sometimes you need certain things to be the way they are, but at least it should make you reconsider your code.

Concerning the export utility functions, we've discussed this offline, so I'm looking forward to seeing the update from that.

Otherwise, good job! And thanks for the hard work on implementing this in AiiDA.

aiida/backends/tests/tools/graph/test_age.py

aiida/tools/visualization/graph.py

ramirezfranciscof · 2019-12-20T11:32:08Z

Thanks to @ltalirz @sphuber and @CasperWA for the reviews! I know reviewing such a big PR (+2.5K lines) can be a PITB so I appreciate it. I will now work on these and as I modify things in my local code I will mark the comments resolved to better keep track of stuff (just letting you know in case you notice this before actually seeing any modifications applied and you wonder why).

csadorf · 2019-12-20T17:08:38Z

@ramirezfranciscof Unfortunately I won't be able to review this before the holidays anymore. Feel free to re-request review on Jan 6 if it's not done by then.

ramirezfranciscof · 2019-12-27T11:12:17Z

I have incorporated almost all of the changes requested; a few "unresolved" comments remain on which I am still waiting for confirmation or some kind of feedback. I must warn that due to some of the comments and discussions with @sphuber , I have made some important changes in the traverse_graph function:

It now knows nothing of TraverseRules, as it directly receives the links to traverse forward and backward.
Export and delete functionalities now do not use the traverse_graph directly, but go through get_nodes_delete and get_nodes_links_export instead: these functions will receive the optional flags and pass those together with the adequate GraphTraversalRule to a common verification and translation functionality parse_traversal_rules, and then will use the traverse_graph with these options. The get_xxx functions return different things.
Visualization does use the traverse_graph and parse_traversal_rules directly, and, I must admit, somewhat unnecessarily: actually in some sections the links are received and parsed into traversal rules and the de-parsed into links again, for example. Although this is not ideal, I didn't want to get into this now because I realized I was starting to re-write the whole graph class and didn't want to delay this PR any further (while also maybe get a bit out of scope). Perhaps this would be better for a posterior graph class refactor and cleanup PR?
I didn't add tests for the get_xxx functions because their functionality is currently being tested by the export and delete tests (since the checks and such were previously in this functions). Eventually the tests should be re-organized, but again I didn't wanted to invest too much time in this now since the functionality was actually being tested, just not in the ideal place.

Let me know what yo think and if you think it would be important to do now any of the things I mentioned that I was leaving to tackle later in a different PR (tests and/or graph class).
(BTW: I am also not particularly married with the name of any of these functions, so if anybody has a better idea please do share)

ramirezfranciscof · 2019-12-27T11:36:24Z

Ah, also, I get this weird error when compiling the documentation, where I cannot add set to the supported types because it won't recognize it (hence some of the docstrings clarify that sets are accepted in the description of the parameter instead of adding it to the :type xxx:). I tried looking for this problem and this might be a bug in the python documentation? Feedback is welcome.

sphuber

Almost there, but still some small things and questions about design

aiida/tools/graph/graph_traversers.py

aiida/tools/visualization/graph.py

aiida/tools/graph/age_rules.py

aiida/tools/graph/age_entities.py

aiida/tools/graph/graph_traversers.py

CasperWA

Looking great. Though I still have some comments and suggested changes.

aiida/backends/tests/tools/graph/test_age.py

aiida/backends/tests/tools/graph/test_graph_traversers.py

aiida/tools/graph/age_entities.py

aiida/tools/graph/age_rules.py

CasperWA · 2020-01-09T14:18:53Z

aiida/tools/importexport/dbexport/__init__.py

+    to_be_exported = traverse_output['nodes']
+    graph_traversal_rules = traverse_output['rules']
+
+    # I create a utility dictionary for mapping pk to uuid.


This is done already some other place in this function.
Please try to realign this so we're not doing it several times.

So it's fine to do it here, but then it shouldn't be done again later.

But maybe it is fine to do it like this, and then I'll try to patch the export function with a separate PR later.

The AiiDA Graph Explorer (AGE) is a general purpose tool to perform graph traversal of AiiDA graphs. It considers AiiDA nodes and groups (eventually even computers and users) as if they were both 'graph nodes' of an 'expanded graph', and generalizes the exploration of said graph. The 'rules' that indicate how to traverse this graph are configured by using generic querybuilder instances (i.e. with information about the connections but without specific initial nodes/groups and without any projections). The initial set of nodes/groups is provided directly to the rule, which then will perform successive applications of the query, each on top of the results of the previous one. This cycle is repeated for a specified number of time, which can be specified to be 'until no new nodes are found'. The current implementation works with the following (public) classes: * Basket: generic container class that can store sets of nodes, groups, node-node edges (aiida links) and group-node edges. These are the objects that the rule-objects receive and return. * UpdateRule: initialized with a querybuilder instance (and optionally a max number of iterations and the option to track edges), it can then be run with an initial set of nodes to obtain the result of the accumulated traversal procedure described by the iterations of the query. * ReplaceRule: same as the update rule, except that at the end of the procedure the returned basket contains not the accumulation of the traversal steps but only the nodes obtained during the last step. This is rule is not compatible with the 'until no new nodes are found' end iteration criteria. * RuleSequence: this can concatenate the application of different rules (it basically works like an UpdateRule that iterates over a chain of rules instead of a single querybuilder instance). * RuleSaveWalkers and RuleSetWalkers: rules that can be provided in a chain of rules given to a RuleSequence to save a given state of the current basket (Save) that can later be used to overwrite the content of said working basket (Set). This is useful in the case where one might need to do two operations 'in parallel' (i.e. on the same set of nodes) instead of doing the second on the results of the first one. Co-Authored-By: ramirezfranciscof <ramirezfranciscof@users.noreply.github.com>

The function traverse_graph works as a simplified interface to interact with the AGE that also removes the need to manually handle the basket and the querybuilder instance: * The price to pay for hiding the basket is that this function can only be used with sets of nodes and links (so, no groups). * The price to pay for hiding the querybuilder is that complex traversal procedures can no longer be specified, the user simply defines which links can be traversed forwards and which backwards, and this criteria is then applied in every iteration (so one could not, in a single call, search only for all called calc nodes of the called work nodes of an initial workflow node, as one will also obtain the calc nodes directly called by that initial workflow). Besides the starting nodes (pks) and links, the user can also provide the number of max iterations desired (which by default is None, which means 'until no new nodes are found') and a boolean that indicates if the links (edges) should be returned. Additionally, two other interfaces are included for ease of use when deleting and exporting. These functions only take the starting set of pks and the rules provided by the user (as 'rule_name_dir' = False/True) and can automatically check if the rule is toggable, set defaults (using aiida.common.links.GraphTraversalRules), and also parse the ruleset into two lists with the links for forward and backward traversal. They will return a dictionary containing the 'nodes' list, the 'links' list (if this was requested, else this will contain `None`) and a dict with the way in which all the rules were applied (using the following format: 'rule_name' = True/False). Co-Authored-By: Leonid Kahle <leonid.kahle@epfl.ch>

The node deletion function now uses the get_nodes_delete function (with the traverse_graph underlying interface using AGE as main engine) to collect the extra nodes that are needed to keep a consistent provenance. The procedure is not very different than the one that was initially implemented so no significant performance improvement is expected, but this is an important first step to homogenize graph traversal throughout the whole code.

The export function now uses the get_nodes_delete function (with the traverse_graph underlying interface using AGE as the main engine) to collect the extra nodes that are needed to keep a consistent provenance. This is performed, more specifically, by the 'retrieve_linked_nodes' function. Whereas previously a different query was performed for each new node added in the previous query step, this new implementation should do a single new query for all the nodes that were added in the previous query step. So these changes are not only important as a first step to homogenize graph traversal throughout the whole code: an improvement in the export procedure is expected as well.

The graph visualization feature now uses the traverse_graph function (with AGE as the main engine) to collect the requested nodes to be visualized. This was implemented in the methods of the graph class: previously, `recurse_descendants` and `recurse_ancestors` used to work by calling `add_incoming` and `add_outgoing` many times, which in turn have to load nodes during the procedure. Now these are all independent and they all call the traverse_graph function, so the information is obtained directly from the query projections and no nodes are loaded. So these changes are not only important as a first step to homogenize graph traversal throughout the whole code: an improvement in the visualization procedure is expected as well.

Agreed in private that everything was addressed

ramirezfranciscof requested a review from CasperWA December 18, 2019 14:08

ltalirz reviewed Dec 18, 2019

View reviewed changes

CasperWA requested a review from csadorf December 19, 2019 11:15

ramirezfranciscof force-pushed the agepr branch from 2334972 to 04969ba Compare December 19, 2019 11:36

ramirezfranciscof force-pushed the agepr branch from 04969ba to 7761ed7 Compare December 19, 2019 13:39

ltalirz reviewed Dec 19, 2019

View reviewed changes

aiida/tools/graph/graph_traversers.py Outdated Show resolved Hide resolved

ltalirz self-requested a review December 19, 2019 14:41

ltalirz reviewed Dec 19, 2019

View reviewed changes

sphuber requested changes Dec 20, 2019

View reviewed changes

CasperWA suggested changes Dec 20, 2019

View reviewed changes

csadorf removed their request for review December 20, 2019 17:07

ramirezfranciscof force-pushed the agepr branch from 7761ed7 to 8c8bf98 Compare December 27, 2019 10:51

ramirezfranciscof force-pushed the agepr branch from 8c8bf98 to b03afa1 Compare December 27, 2019 11:29

ramirezfranciscof requested review from sphuber and CasperWA January 6, 2020 13:16

sphuber requested changes Jan 7, 2020

View reviewed changes

ramirezfranciscof force-pushed the agepr branch from b03afa1 to e11df98 Compare January 8, 2020 18:06

ramirezfranciscof requested a review from sphuber January 9, 2020 08:44

CasperWA previously requested changes Jan 9, 2020

View reviewed changes

ramirezfranciscof force-pushed the agepr branch from e11df98 to 534f82e Compare January 9, 2020 18:14

lekah and others added 5 commits January 10, 2020 10:54

ramirezfranciscof force-pushed the agepr branch from 534f82e to 8ec079c Compare January 10, 2020 09:54

sphuber approved these changes Jan 10, 2020

View reviewed changes

ramirezfranciscof merged commit 47cfe34 into aiidateam:develop Jan 10, 2020

sphuber mentioned this pull request Jan 10, 2020

[WIP] Generic Graph Traverser #3418

Closed

ramirezfranciscof deleted the agepr branch January 13, 2020 11:38

lekah mentioned this pull request Apr 1, 2020

Graph traversals using QueryBuilder #3535

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of new graph traversal tools #3686

Addition of new graph traversal tools #3686

ramirezfranciscof commented Dec 18, 2019

ramirezfranciscof commented Dec 18, 2019

ltalirz left a comment

chrisjsewell commented Dec 18, 2019

zooks97 commented Dec 18, 2019

ltalirz commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019 •

edited

Loading

ramirezfranciscof commented Dec 19, 2019

greschd commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

greschd commented Dec 19, 2019 •

edited

Loading

ramirezfranciscof commented Dec 19, 2019

ltalirz left a comment

sphuber commented Dec 19, 2019

ltalirz commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

sphuber commented Dec 19, 2019

ltalirz commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

sphuber left a comment

CasperWA left a comment

ramirezfranciscof commented Dec 20, 2019

csadorf commented Dec 20, 2019

ramirezfranciscof commented Dec 27, 2019

ramirezfranciscof commented Dec 27, 2019

sphuber left a comment

CasperWA left a comment

CasperWA Jan 9, 2020

CasperWA Jan 9, 2020

Addition of new graph traversal tools #3686

Addition of new graph traversal tools #3686

Conversation

ramirezfranciscof commented Dec 18, 2019

ramirezfranciscof commented Dec 18, 2019

ltalirz left a comment

Choose a reason for hiding this comment

chrisjsewell commented Dec 18, 2019

zooks97 commented Dec 18, 2019

ltalirz commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019 • edited Loading

ramirezfranciscof commented Dec 19, 2019

greschd commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

greschd commented Dec 19, 2019 • edited Loading

ramirezfranciscof commented Dec 19, 2019

ltalirz left a comment

Choose a reason for hiding this comment

sphuber commented Dec 19, 2019

ltalirz commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

sphuber commented Dec 19, 2019

ltalirz commented Dec 19, 2019

ramirezfranciscof commented Dec 19, 2019

sphuber left a comment

Choose a reason for hiding this comment

CasperWA left a comment

Choose a reason for hiding this comment

ramirezfranciscof commented Dec 20, 2019

csadorf commented Dec 20, 2019

ramirezfranciscof commented Dec 27, 2019

ramirezfranciscof commented Dec 27, 2019

sphuber left a comment

Choose a reason for hiding this comment

CasperWA left a comment

Choose a reason for hiding this comment

CasperWA Jan 9, 2020

Choose a reason for hiding this comment

CasperWA Jan 9, 2020

Choose a reason for hiding this comment

ramirezfranciscof commented Dec 19, 2019 •

edited

Loading

greschd commented Dec 19, 2019 •

edited

Loading