Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process functions: Restore support for dynamic nested input namespaces #5808

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Nov 30, 2022

Fixes #5806

This behavior was supported before v2.0.4 but was accidentally removed by
putting more strict type validation on inputs passed to process
functions. In d2323d4, a bug was fixed
where it was possible to pass non-storable inputs to dynamic port
namespaces. For example, the following was allowed:

@calcfunction
def sum(**kwargs):
    return sum(kwargs.values())

sum(**{'a': 1, 'b': 2})

However, since the values in the kwargs are not Data nodes (and this
was before automatic input serialization was added for process functions)
and so they could not be stored in the provenance graph. This would
therefore cause accidental provenance loss.

This bug was fixed in the aforementioned commit by explicitly setting
valid_type=Data on the inputs namespace of the Process base class.

The fix, however, inadvertently also removed support for arbitrarily
nested keyword arguments in process functions. While this would work
before, it would now raise a ValidationError saying that the dynamic
namespace received a dict instead of a Data node. The cause is that
the PortNamespace.validate_dynamic_ports did not recurse down nested
namespaces, but only validated top-level values.

This behavior is fixed in plumpy==0.21.3 where now the validation
method will recursively go down any nested namespaces for a dynamic port
namespace and validate as long as all leaf-values match the valid_type
of the PortNamespace.

@sphuber sphuber requested a review from unkcpz November 30, 2022 20:37
@sphuber
Copy link
Contributor Author

sphuber commented Nov 30, 2022

@unkcpz as a quick test, I simply overrode the PortNamespace.validate_dynamic_ports method to add the correct behavior. But I think this fix is not specific to process functions but all processes and so really should probably be in plumpy.

Note that essentially this bug also exists for normal processes like WorkChains. If you create a dynamic namespace, you can pass any inputs, even if they are not storable and the process will validate. Example:

from aiida.engine import WorkChain, run_get_node

class NestedWorkChain(WorkChain):

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input_namespace('nested', dynamic=True)
        spec.outline(cls.do_run)

    def do_run(self):
        pas

_, node = run_get_node(NestedWorkChain, **{'nested': {'namespace': {'int': 1}}})

assert node.get_incoming().nested()

The following fails, because non of the inputs are storable, but the process validates and so the inputs of the node are empty. I am thinking that we should not allow this behavior and require that leaf inputs in dynamic namespaces are storable, i.e. Data instances, unless the namespace is explicitly marked as non_db=True. This might break existing implementations out there though. But then again, maybe this would be good, since they are losing (potentially unintentionally) provenance.

@sphuber
Copy link
Contributor Author

sphuber commented Nov 30, 2022

This is the alternative PR in plumpy if we decide that the change should be implemented there instead: aiidateam/plumpy#255

Decision was made to put this in plumpy and the change was released with plumpy==0.21.3.

@sphuber sphuber force-pushed the fix/5806/process-functions-nested-input-namespaces branch from c3d3a6f to d7752d0 Compare December 6, 2022 11:48
This behavior was supported before v2.0.4 but was accidentally removed by
putting more strict type validation on inputs passed to process
functions. In d2323d4, a bug was fixed
where it was possible to pass non-storable inputs to dynamic port
namespaces. For example, the following was allowed:

    @calcfunction
    def sum(**kwargs):
        return sum(kwargs.values())

    sum(**{'a': 1, 'b': 2})

However, since the values in the `kwargs` are not `Data` nodes (and this
was before automatic input serialization was added for process functions)
and so they could not be stored in the provenance graph. This would
therefore cause accidental provenance loss.

This bug was fixed in the aforementioned commit by explicitly setting
`valid_type=Data` on the inputs namespace of the `Process` base class.

The fix, however, inadvertently also removed support for arbitrarily
nested keyword arguments in process functions. While this would work
before, it would now raise a `ValidationError` saying that the dynamic
namespace received a `dict` instead of a `Data` node. The cause is that
the `PortNamespace.validate_dynamic_ports` did not recurse down nested
namespaces, but only validated top-level values.

This behavior is fixed in `plumpy==0.21.3` where now the validation
method will recursively go down any nested namespaces for a dynamic port
namespace and validate as long as all leaf-values match the `valid_type`
of the `PortNamespace`.
@sphuber sphuber force-pushed the fix/5806/process-functions-nested-input-namespaces branch from d7752d0 to 4c4ab5a Compare December 7, 2022 20:20
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! looks all good to me. The install test failed since the conda forge of plumpy is not ready I guess?

@sphuber
Copy link
Contributor Author

sphuber commented Dec 7, 2022

Thanks! looks all good to me. The install test failed since the conda forge of plumpy is not ready I guess?

Exactly, I will take care of that now before merging

@sphuber sphuber merged commit a86e6fb into aiidateam:main Dec 8, 2022
@sphuber sphuber deleted the fix/5806/process-functions-nested-input-namespaces branch December 8, 2022 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore support for dynamic nested namespaces in inputs for process functions
2 participants