Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in verdi process show where not all called processes are shown #2962

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jun 4, 2019

Fixes #2868

The utility get_node_info was called by verdi process show to format
a multiline string of the given process node, including all the inputs
and called processes. The function used the LinkManager.nested to
reconstruct a nested representation of the relevant set of neighboring
nodes. This is useful for the inputs and outputs that can in fact be
nested and have unique labels, however, for called links, the link
labels are not necessarily unique and so the dictionary returned by the
nested method could only represent a single called process with the same
link label. To fix this we use a different formatter that simply takes
the list of neighboring nodes, which then does not care about label
uniqueness.

Since the nested method was silently overwriting duplicate keys, which
caused this bug, its behavior is adapted to raise a KeyError in this
situation.

@sphuber sphuber requested review from ltalirz and giovannipizzi June 4, 2019 13:45
@coveralls
Copy link

coveralls commented Jun 4, 2019

Coverage Status

Coverage increased (+0.007%) to 72.914% when pulling d1fa180 on sphuber:fix_2868_verdi_process_show_duplicate_links into 34188d8 on aiidateam:develop.

@ltalirz
Copy link
Member

ltalirz commented Jun 4, 2019

To fix this, we have to generate unique labels if duplicates exist.

If link names are not unique, would this not rather indicate that a list should be returned instead of a dictionary? What if someone calls a link CALL_123 or FILE_1?

Of course, this depends a bit where this is intended to be used.

Concerning the method 1-line description:

Return the matched nodes in the original nested form by expanding the collapsed link labels.

This is assuming a lot of knowledge about the context (original form, collapsed link labels, ...). Could this be rewritten in a more intuitive form (perhaps with an additional sentence in the description?)

@sphuber
Copy link
Contributor Author

sphuber commented Jun 4, 2019

If link names are not unique, would this not rather indicate that a list should be returned instead of a dictionary? What if someone calls a link CALL_123?

This would be fine, even if there is another link with CALL_123 then the second one will simply be changed to CALL_123 <pk>. This is not a new "problem" of AiiDA. For example when doing node.outputs. and hitting tab, where node is a StructureData that has been used many times as input with the label structure. This happens all the time, but since all the labels are identical, you cannot address any specific one unambiguously. This is why in places where this occurs, the old solution was to suffix the label with a pk.

Here we have a similar situation. The LinkManager is what is returned by calling Node.get_incoming or Node.get_outgoing and provides utility functions to return subsets of the collection of neighboring nodes matched by the initial method. The nested one returns the collection in the form of a nested dictionary, where imploded link labels that represent nested namespaces, are exploded. If you pass an input for example in {'namespace': {'test': Node}}, its link label becomes namespace__test. This exploded dictionary is useful in places where I need that nested structure again instead of the flat link labels. Currently there is two places

  1. To populate the restart process builder from a completed ProcessNode.
  2. To display a nested representation of node links in verdi process show, e.g.:
Property       Value
-------------  ------------------------------------
type           CalcJobNode
pk             294746
uuid           2ad0110f-08ac-4ba5-85e1-0e4fc75839c3
label
description
ctime          2019-06-04 13:05:16.614230+02:00
mtime          2019-06-04 13:35:38.218381+02:00
process state  Finished
exit status    0
computer       [2] daint-gpu

Called by        PK  Type
-----------  ------  -------------
CALL_CALC    294740  WorkChainNode

Inputs      PK      Type
----------  ------  -------------
pseudos
    Nb      58      UpfData
    K       45      UpfData
    Cl      87      UpfData
code        293421  Code
kpoints     294743  KpointsData
parameters  294744  Dict
settings    294745  Dict
structure   305     StructureData

Note that the pseudos are nested, because it is a namespace.

This is assuming a lot of knowledge (original form, collapsed link labels, ...).

Yes, but I don't think I can explain the data structure and design of AiiDA in every docstring.
In this case I would then have to explain port namespaces, how they are mapped onto flat links etc.
At some point one needs to know a bit about the internal design to understand certain methods.
But maybe you have a suggestion how this can be made clear in relatively little text.

@ltalirz
Copy link
Member

ltalirz commented Jun 4, 2019

And here a suggestion for the docstring:

"""Construct dictionary of nodes that mirrors the nesting of link namespaces.

While the actual link labels flatten the nested structure of input/output namespaces, this function returns a dictionary of nodes that mirrors the nesting of link namespaces.
This is useful e.g. to group links together that belong to the same namespace.

The utility `get_node_info` was called by `verdi process show` to format
a multiline string of the given process node, including all the inputs
and called processes. The function used the `LinkManager.nested` to
reconstruct a nested representation of the relevant set of neighboring
nodes. This is useful for the inputs and outputs that can in fact be
nested and have unique labels, however, for called links, the link
labels are not necessarily unique and so the dictionary returned by the
nested method could only represent a single called process with the same
link label. To fix this we use a different formatter that simply takes
the list of neighboring nodes, which then does not care about label
uniqueness.

Since the `nested` method was silently overwriting duplicate keys, which
caused this bug, its behavior is adapted to raise a `KeyError` in this
situation.
@sphuber sphuber force-pushed the fix_2868_verdi_process_show_duplicate_links branch from b156501 to 5ba688a Compare June 4, 2019 17:06
@sphuber sphuber changed the title Generate ad-hoc unique labels in LinkManager.nested Fix bug in verdi process show where not all called processes are shown Jun 4, 2019
@sphuber
Copy link
Contributor Author

sphuber commented Jun 4, 2019

@ltalirz OK I have updated the solution as discussed. Instead of adapting nested it now actually raises a KeyError in case of a conflict. I have also updated the docstring based on your suggestion but reworded. The get_node_info now uses a separate approach for the called processes formatting

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @sphuber
looks good!

@sphuber sphuber merged commit 5ff132e into aiidateam:develop Jun 4, 2019
@sphuber sphuber deleted the fix_2868_verdi_process_show_duplicate_links branch June 4, 2019 18:05
@sphuber
Copy link
Contributor Author

sphuber commented Jun 4, 2019

Thanks @ltalirz for the useful discussion, the eventual solution is far better than the original one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output of verdi process show is broken when duplicate link labels
3 participants