-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of new graph traversal tools #3686
Conversation
(I specifically tagged @CasperWA because he had requested to review this once it was finished, but all reviews are most welcome) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ramirezfranciscof for pushing forward with this!
I've had a quick read through a couple of files, some minor comments here and there.
Perhaps @chrisjsewell would like to give a brief look at what is happening in graph.py
?
For the graph visualisation, obviously the principle thing is that test_graph.py still passes. But also you need to check whether visualising_graphs.ipynb still runs or requires any changes. |
All of the tests in test_graph.py do pass, but I have not tested visualising_graphs.ipynb. I purposefully did not change the interface for the Graph class or any of its methods to maintain full backward compatibility (except that providing a print function does not print the graph nodes and links as they are traversed), so I expect that there should be no issues with this notebook or the documentation that it produces. |
@ramirezfranciscof Are you working on updating the branch? Do you need help? |
@ltalirz Yes, I will now apply the requested changes. If it is ok with you, I will make modify the respective existing commits with the corresponding change, so I will force push the modifications. |
2334972
to
04969ba
Compare
@ltalirz I applied all corrections except for the comment in the links (waiting for the ok) and the |
Ok, docs are now failing because it doesn't recognize a type ( |
I think you can use the |
@greschd But you mean in the docstring? Like this?
Because this is not working for me, its just rendering literally 'typing.Optional[int]' |
Ah, no.. apparently that does not work 🙄 The following works, though:
If we start adding type hints, we can consider the following sphinx plugin to avoid duplicating the info: https://pypi.org/project/sphinx-autodoc-typehints/ |
04969ba
to
7761ed7
Compare
Thanks @greschd ! That seemed to have worked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ramirezfranciscof
@ltalirz I applied all corrections except for the comment in the links (waiting for the ok)
Made suggestion here
Please check, specially if the keyset property (to replace get_keys) was done correctly
Looks fine.
(I would have liked to use self.keyset = None in the init but pylint complains if _keyset is not initialized explicitly there =S).
Also this is correct - the constructor should initialize the underlying variable.
@sphuber @CasperWA From my side this looks good and brings enormous speedup to the export.
Who still wants to have a look before this goes in?
Thanks guys, I would like to give it a look still and will do so today. |
Just for reference, here again the results from the test I did. Test set: Exporting 67k groups with 5 nodes each (which expands to 1.4M nodes with provenance) "Old implementation"[1]: ~450 minutes [1] This is on a modified version of the progress bar PR, where I already removed slow-down from the progress bar. |
I'm a little confused by this. Part of the idea of the setter is to avoid duplication of checks, parsings, etc correct? The typical example being a property |
This may be just a bug in |
@ramirezfranciscof I think you make a valid point. I agree that there are also cases, where it makes sense to use the setter already for initialization, in particular if you were passing along arguments of the constructor to the variable. Here, you are setting it to |
Yes, this is indeed the case. All I found regarding this problem was this issue in the pylint github. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My compliments to the AGE team, this is some great code! This should really clean a lot of the code base and as @ltalirz has noticed make it a lot faster too. I still have some minor requests but these should be relatively easy to address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to submit my review - and GitHub failed with a Unicorn! ...
It seems it has saved all my review changes, but this message was not cached, so I'll try again.... hmm...
Very good job on this massive PR!
I have been quite meticulous in my comments, considering changes to comments and variable names and more ... Don't worry about it, but do try and consider my comments fairly.
Be careful not to add Python2-specific code (e.g., imports from __future__
).
Be aware that pylint is only trying to help, not annoy - although it definitely can be annoying. It is only trying to suggest that your code may be better - which is not always the case, sometimes you need certain things to be the way they are, but at least it should make you reconsider your code.
Concerning the export utility functions, we've discussed this offline, so I'm looking forward to seeing the update from that.
Otherwise, good job! And thanks for the hard work on implementing this in AiiDA.
Thanks to @ltalirz @sphuber and @CasperWA for the reviews! I know reviewing such a big PR (+2.5K lines) can be a PITB so I appreciate it. I will now work on these and as I modify things in my local code I will mark the comments resolved to better keep track of stuff (just letting you know in case you notice this before actually seeing any modifications applied and you wonder why). |
@ramirezfranciscof Unfortunately I won't be able to review this before the holidays anymore. Feel free to re-request review on Jan 6 if it's not done by then. |
7761ed7
to
8c8bf98
Compare
I have incorporated almost all of the changes requested; a few "unresolved" comments remain on which I am still waiting for confirmation or some kind of feedback. I must warn that due to some of the comments and discussions with @sphuber , I have made some important changes in the
Let me know what yo think and if you think it would be important to do now any of the things I mentioned that I was leaving to tackle later in a different PR (tests and/or graph class). |
8c8bf98
to
b03afa1
Compare
Ah, also, I get this weird error when compiling the documentation, where I cannot add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, but still some small things and questions about design
b03afa1
to
e11df98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. Though I still have some comments and suggested changes.
to_be_exported = traverse_output['nodes'] | ||
graph_traversal_rules = traverse_output['rules'] | ||
|
||
# I create a utility dictionary for mapping pk to uuid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done already some other place in this function.
Please try to realign this so we're not doing it several times.
So it's fine to do it here, but then it shouldn't be done again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe it is fine to do it like this, and then I'll try to patch the export function with a separate PR later.
e11df98
to
534f82e
Compare
The AiiDA Graph Explorer (AGE) is a general purpose tool to perform graph traversal of AiiDA graphs. It considers AiiDA nodes and groups (eventually even computers and users) as if they were both 'graph nodes' of an 'expanded graph', and generalizes the exploration of said graph. The 'rules' that indicate how to traverse this graph are configured by using generic querybuilder instances (i.e. with information about the connections but without specific initial nodes/groups and without any projections). The initial set of nodes/groups is provided directly to the rule, which then will perform successive applications of the query, each on top of the results of the previous one. This cycle is repeated for a specified number of time, which can be specified to be 'until no new nodes are found'. The current implementation works with the following (public) classes: * Basket: generic container class that can store sets of nodes, groups, node-node edges (aiida links) and group-node edges. These are the objects that the rule-objects receive and return. * UpdateRule: initialized with a querybuilder instance (and optionally a max number of iterations and the option to track edges), it can then be run with an initial set of nodes to obtain the result of the accumulated traversal procedure described by the iterations of the query. * ReplaceRule: same as the update rule, except that at the end of the procedure the returned basket contains not the accumulation of the traversal steps but only the nodes obtained during the last step. This is rule is not compatible with the 'until no new nodes are found' end iteration criteria. * RuleSequence: this can concatenate the application of different rules (it basically works like an UpdateRule that iterates over a chain of rules instead of a single querybuilder instance). * RuleSaveWalkers and RuleSetWalkers: rules that can be provided in a chain of rules given to a RuleSequence to save a given state of the current basket (Save) that can later be used to overwrite the content of said working basket (Set). This is useful in the case where one might need to do two operations 'in parallel' (i.e. on the same set of nodes) instead of doing the second on the results of the first one. Co-Authored-By: ramirezfranciscof <ramirezfranciscof@users.noreply.github.com>
The function traverse_graph works as a simplified interface to interact with the AGE that also removes the need to manually handle the basket and the querybuilder instance: * The price to pay for hiding the basket is that this function can only be used with sets of nodes and links (so, no groups). * The price to pay for hiding the querybuilder is that complex traversal procedures can no longer be specified, the user simply defines which links can be traversed forwards and which backwards, and this criteria is then applied in every iteration (so one could not, in a single call, search only for all called calc nodes of the called work nodes of an initial workflow node, as one will also obtain the calc nodes directly called by that initial workflow). Besides the starting nodes (pks) and links, the user can also provide the number of max iterations desired (which by default is None, which means 'until no new nodes are found') and a boolean that indicates if the links (edges) should be returned. Additionally, two other interfaces are included for ease of use when deleting and exporting. These functions only take the starting set of pks and the rules provided by the user (as 'rule_name_dir' = False/True) and can automatically check if the rule is toggable, set defaults (using aiida.common.links.GraphTraversalRules), and also parse the ruleset into two lists with the links for forward and backward traversal. They will return a dictionary containing the 'nodes' list, the 'links' list (if this was requested, else this will contain `None`) and a dict with the way in which all the rules were applied (using the following format: 'rule_name' = True/False). Co-Authored-By: Leonid Kahle <leonid.kahle@epfl.ch>
The node deletion function now uses the get_nodes_delete function (with the traverse_graph underlying interface using AGE as main engine) to collect the extra nodes that are needed to keep a consistent provenance. The procedure is not very different than the one that was initially implemented so no significant performance improvement is expected, but this is an important first step to homogenize graph traversal throughout the whole code.
The export function now uses the get_nodes_delete function (with the traverse_graph underlying interface using AGE as the main engine) to collect the extra nodes that are needed to keep a consistent provenance. This is performed, more specifically, by the 'retrieve_linked_nodes' function. Whereas previously a different query was performed for each new node added in the previous query step, this new implementation should do a single new query for all the nodes that were added in the previous query step. So these changes are not only important as a first step to homogenize graph traversal throughout the whole code: an improvement in the export procedure is expected as well.
The graph visualization feature now uses the traverse_graph function (with AGE as the main engine) to collect the requested nodes to be visualized. This was implemented in the methods of the graph class: previously, `recurse_descendants` and `recurse_ancestors` used to work by calling `add_incoming` and `add_outgoing` many times, which in turn have to load nodes during the procedure. Now these are all independent and they all call the traverse_graph function, so the information is obtained directly from the query projections and no nodes are loaded. So these changes are not only important as a first step to homogenize graph traversal throughout the whole code: an improvement in the visualization procedure is expected as well.
534f82e
to
8ec079c
Compare
Agreed in private that everything was addressed
This PR incorporates two new tools for graph traversal:
The AiiDA Graph Explorer (AGE) package of classes: this is a general tool that allows to do complex graph traversal operations in an AiiDA 'expanded graph' (in which both the AiiDA nodes and groups are considered to be graph nodes) by defining 'rules' from generic querybuilder instances. Then, given an initial set of nodes (that have to be set inside generic 'baskets'), these queries will be successively applied on top of the results of the previous one, repeating the whole cycle the desired number of times (which could be 'until no new nodes are added').
The
traverse_graph
function: this is a simplified interface to use the AGE to search for AiiDA nodes and links using a reduced set of customizable rules. Contrary to the AGE, which retains basically the same versatility of the querybuilder and thus allows for very complex traversals, withtraverse_graph
you can only specify which type of links will be allowed to be traversed and in which direction, and these are all traversed in each iteration. The advantage is that one no longer needs to manually define and handle baskets and querybuilders.The function
traverse_graph
(which uses AGE as its search engine) is now used by the delete, the export and the graph visualization procedures. For the first one, thedelete_nodes
function now uses this function instead of doing its own search but the procedure is similar to how it was performed before. On the other hand, for the export procedure every node added to the set was queried separately for ancestors and descendants, whereas now the whole sets of new nodes found are queried for more nodes (all of this happens inside theretrieve_linked_nodes
function). Finally, the graph class methodsrecurse_descendants
andrecurse_ancestors
used to work by calling theadd_incoming
andadd_outgoing
many times, inside of which nodes were loaded to get their pks. Now these are all independent and each has its own call totraverse_graph
, where pks are obtained directly from the query projection.This closes #3331