Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise when returning an unstored Data from a WorkflowNode #2747

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Apr 12, 2019

Fixes #2675

This will happen if one tries to return a Data node from within the
body of a WorkChain or workfunction, which means that they probably
created a new node based on its inputs or data returned by one of the
processes it calls. However, this is strictly forbidden as the
provenance of the new node will be lost. Given that beginning users are
likely to make this mistake, instead of issuing a warning we explicitly
forbid this behavior and raise in the link validation.

super(WorkflowNode, self).validate_outgoing(target, link_type, link_label)
if link_type is LinkType.RETURN and not target.is_stored:
raise ValueError(
'trying to return an unstored Data node from a workflow is not allowed because it will cause the '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already good. What about something like:
"Workflow '{}' tried returning an unstored Data node. This likely means new Data is being created inside the workflow. In order to preserve data provenance, use a calcfunction to create this data, and return its output from the workflow.".format(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question (just curious):
Why is the Node the right place to check this? Shouldn't it be the process making sure it's returning the right things?

And one minor detail: in the PR description replace "Data node" with "unstored Data node"

@sphuber
Copy link
Contributor Author

sphuber commented Apr 12, 2019

My thirst through was to override the out method for the workchain, but then that would not catch workfunctions. There I would have to do it for the FunctionProcess but since that is shared with calcfunction I would have to make a switch on the node type, which is ugly. Since it is then ultimately the node type that decides anyway, might as well put there in one place. This has the added benefit that this also prevent sneaky pete's from calling my_illegally_created_node.add_incoming(self, link_type=LinkType.RETURN) in the work chain 😉 which would not be caught by the check in the overridden out

@sphuber sphuber force-pushed the fix_2675_forbid_workchain_return_unstored branch 3 times, most recently from 3e58c4d to 599d55b Compare April 12, 2019 21:57
@coveralls
Copy link

coveralls commented Apr 12, 2019

Coverage Status

Coverage increased (+0.004%) to 70.49% when pulling fcddd15 on sphuber:fix_2675_forbid_workchain_return_unstored into cc38745 on aiidateam:develop.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @sphuber !

super(WorkflowNode, self).validate_outgoing(target, link_type, link_label)
if link_type is LinkType.RETURN and not target.is_stored:
raise ValueError(
'Workflow<{}> tried returning an unstored `Data` node. This likely means new `Data` is being created '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more of the name of the workflow (which should tell people where in the code they need to change things).
Happy to re-approve if you still want to change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted it to print the process label and also added two more tests on workchain and workfunction level, to make sure it also works through there

This will happen if one tries to return a `Data` node from within the
body of a `WorkChain` or `workfunction`, which means that they probably
created a new node based on its inputs or data returned by one of the
processes it calls. However, this is strictly forbidden as the
provenance of the new node will be lost. Given that beginning users are
likely to make this mistake, instead of issuing a warning we explicitly
forbid this behavior and raise in the link validation.
@sphuber sphuber force-pushed the fix_2675_forbid_workchain_return_unstored branch from 599d55b to fcddd15 Compare April 13, 2019 06:38
@sphuber sphuber merged commit d0ab2b8 into aiidateam:develop Apr 13, 2019
@sphuber sphuber deleted the fix_2675_forbid_workchain_return_unstored branch April 13, 2019 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn users when they are creating new Data in a WorkChain
3 participants