Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionals #4

Merged
merged 16 commits into from
Feb 21, 2020
Merged

Conditionals #4

merged 16 commits into from
Feb 21, 2020

Conversation

ghost
Copy link

@ghost ghost commented Nov 22, 2019

Additions to specification and conformance tests for conditionals feature.

@ghost ghost requested review from stain, tetron and mr-c November 22, 2019 20:51
@ghost ghost marked this pull request as ready for review November 27, 2019 16:21
Kaushik Ghose added 3 commits December 4, 2019 14:50
This has been moved over from the old CWL repository.
The history has been discarded.
@ghost
Copy link
Author

ghost commented Dec 5, 2019

The following conversation has been copied over from the original PR at common-workflow-language/common-workflow-language#862

@ghost
Copy link
Author

ghost commented Dec 5, 2019

@jmchilton on Oct 17, 2019

Thanks for laying this all out - I think the document is super useful and I appreciate all the effort. I think what is here makes some good conceptual sense outside the context of scatter/merge. I'm nervous about thinking through scatter/merge - but that is probably a deficiency with my ability to keep everything in my head. That said, if this included an explicit discussion of scatter and merging and how things work out with examples that would be really helpful.

@ghost
Copy link
Author

ghost commented Dec 5, 2019

@jmchilton on Oct 18, 2019

I've slept on this and I've decided this should be simpler and more general.

all outputs of a skipped step that are otherwise not handled are converted to the CWL null type.

This feels right - I like this! Given this I don't think we really should distinguish between a skipped output defaulted to null and an explicit null output. My argument here is essentially "who cares?" - my default position is this is complexity neither the user nor the implementer really needs. If we don't need it, lets try without it.

One of the design goals with this was to avoid thinking about state - we aren’t exposing a ‘skipped’ states in CWL documents or to CWL authors at all. This again was good, and I think my proposal here to eliminate the concept of branch selection and replace it with just dispatching on nulls goes farther in that direction.

We are eliminating ideas we expose to the user I think - they don’t really need to think about whether a previous step was skipped or just null. This reduces the cognitive load. And if 99.98% of the time CWL authors won’t care about the distinction between an explicit null and a skipped implicit null (my opinion), we should eliminate the cognitive load and implementation complexity associated with it.

In addition to simplifying this for implementers and users, I think the result is constructs that are more general and reusable. This is good. For instance:

source:
  first_that_ran:
    - step1/out1
    - in1

becomes:

source:
  first_non_null:
    - step1/out1
    - in1

I think this name is more clear and that feels to me like evidence the cognitive load is smaller. This also becomes a more general purpose tool that has uses outside of conditionals. We could implement optional exclusive parameters more cleanly using this, dispatching on filtered arrays, etc.. We’re finally taking steps down the road toward replacing common expressions with small utility patterns.

Another benefit of this new approach here is first_non_null and when can now be implemented and added independently to the spec. They aren’t needed for each other and are useful independent of each other.

Update:

Talked with @tetron and @kaushik-work a bit more about this - a few notes.

  • With this change, this would be pretty much what @tetron originally proposed.
  • This data flow pattern and these null handling tools would work just as well for an extension that would allow scatters with failures to replace the failed outputs with null.
  • Both skip and failure handling could benefit from allowing tool outputs to specify default values. Default outputs would also probably be a clean solution to the potential edge case of wanting to treat the implicit skipped output null and an explicit output null (mentioned above).

@ghost
Copy link
Author

ghost commented Dec 5, 2019

@kaushik-work on Oct 18, 2019

I like the idea of stripping away as much as possible. first_non_null and remove_nulls would do the job, and do it in an easy to understand manner. The crucial point here is what use cases we are impeding by doing this. You have convinced me that null outputs are likely a tiny minor (mis-)use case.

@ghost
Copy link
Author

ghost commented Dec 5, 2019

@stain on Oct 21, 2019

I also like this late edition from @jmchilton - it is very important to state that it is first_non_null rather than first_that_ran as that would imply a temporal order. In fact this is exactly how we used to do this in Taverna 1 where we picked the value that first (in time) appeared on the input - which could give unpredictable outputs in the case where multiple upstreams ran.

We found it was was easy for users to get multiple conditionals wrong - e.g. they would have two steps, one with if a <1 and another if a>1 - and not cover a==1. Or what we saw more (because we had "fail if.." rather than "run if"), they would do if not a>1 and if not a<1 and both accidentally get triggered on a==1 - but in unpredictable order.

Add multiple inputs to evaluate and it's not easy to see the conditionals that can trigger at same time.

So the name is important - I think we want to have in pre-defined order as an important requirement here.

Now back to the proposal from @jmchilton :

If someone really wanted the ability to return null that would not trigger fallover, they can change their types to array and return [null] or null - basically the existing array optional hack.

Actually I think this is an advantage, as it means the step itself can decide to fail, even after its execution, e.g. a valueFrom can decide the value was not good enough and refuse to deliver its output (give null) instead, triggering the next non-null value to be preferred.

It does mean unnecessary execution of the additional upstream fallback steps (we have not in the spec said that an engine is free to not execute steps that do not contribute to outputs - they may be doing house keeping).

@ghost
Copy link
Author

ghost commented Dec 5, 2019

@kaushik-work on Oct 29, 2019

@jmchilton , @stain, @tetron @mr-c I've been thinking about the syntax:

source:
  first_non_null:
    - step1/out1
    - in1

And now it is stuck in my head that this operator is really of the same kind as the link_merge operator. The way the syntax is denoted here is nice, but a little arbitrary. Why is link_merge sitting outside? Also, this means source can now be a string or an object, which is not the worst thing, but a little irregular.

I'd like to go back to the earlier syntax, which was congruent with how we had link_merge. So:

source:
    - step1/out1
    - step2/out1
    - step3/out1
    - step4/out1

pickValue: all_non_null
linkMerge: merge_flattened

A future extension can have fail_filter and so on.
I feel what we are doing here is applying different operators to this list of inputs and producing another list or a scalar.

In terms of operator precedence, we have well defined rules and they are (including this proposal)

pickValue -> linkMerge -> default -> valueFrom

linkMerge -> pickValue -> default -> valueFrom (as agreed on CWL meet of 2019.11.05)

@tetron tetron merged commit db3b07c into master Feb 21, 2020
@tetron
Copy link
Member

tetron commented Feb 21, 2020

I'm going to go ahead and merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant