Conditionals #4

ghost · 2019-11-22T20:51:49Z

Additions to specification and conformance tests for conditionals feature.

This puts the basic validations and error conditons through it's paces

…l-v1.2 into conditionals

This has been moved over from the old CWL repository. The history has been discarded.

ghost · 2019-12-05T15:07:06Z

The following conversation has been copied over from the original PR at common-workflow-language/common-workflow-language#862

ghost · 2019-12-05T15:07:39Z

@jmchilton on Oct 17, 2019

Thanks for laying this all out - I think the document is super useful and I appreciate all the effort. I think what is here makes some good conceptual sense outside the context of scatter/merge. I'm nervous about thinking through scatter/merge - but that is probably a deficiency with my ability to keep everything in my head. That said, if this included an explicit discussion of scatter and merging and how things work out with examples that would be really helpful.

ghost · 2019-12-05T15:08:19Z

@jmchilton on Oct 18, 2019

I've slept on this and I've decided this should be simpler and more general.

all outputs of a skipped step that are otherwise not handled are converted to the CWL null type.

This feels right - I like this! Given this I don't think we really should distinguish between a skipped output defaulted to null and an explicit null output. My argument here is essentially "who cares?" - my default position is this is complexity neither the user nor the implementer really needs. If we don't need it, lets try without it.

One of the design goals with this was to avoid thinking about state - we aren’t exposing a ‘skipped’ states in CWL documents or to CWL authors at all. This again was good, and I think my proposal here to eliminate the concept of branch selection and replace it with just dispatching on nulls goes farther in that direction.

We are eliminating ideas we expose to the user I think - they don’t really need to think about whether a previous step was skipped or just null. This reduces the cognitive load. And if 99.98% of the time CWL authors won’t care about the distinction between an explicit null and a skipped implicit null (my opinion), we should eliminate the cognitive load and implementation complexity associated with it.

In addition to simplifying this for implementers and users, I think the result is constructs that are more general and reusable. This is good. For instance:

source:
  first_that_ran:
    - step1/out1
    - in1

becomes:

source:
  first_non_null:
    - step1/out1
    - in1

I think this name is more clear and that feels to me like evidence the cognitive load is smaller. This also becomes a more general purpose tool that has uses outside of conditionals. We could implement optional exclusive parameters more cleanly using this, dispatching on filtered arrays, etc.. We’re finally taking steps down the road toward replacing common expressions with small utility patterns.

Another benefit of this new approach here is first_non_null and when can now be implemented and added independently to the spec. They aren’t needed for each other and are useful independent of each other.

Update:

Talked with @tetron and @kaushik-work a bit more about this - a few notes.

With this change, this would be pretty much what @tetron originally proposed.
This data flow pattern and these null handling tools would work just as well for an extension that would allow scatters with failures to replace the failed outputs with null.
Both skip and failure handling could benefit from allowing tool outputs to specify default values. Default outputs would also probably be a clean solution to the potential edge case of wanting to treat the implicit skipped output null and an explicit output null (mentioned above).

ghost · 2019-12-05T15:09:12Z

@kaushik-work on Oct 18, 2019

I like the idea of stripping away as much as possible. first_non_null and remove_nulls would do the job, and do it in an easy to understand manner. The crucial point here is what use cases we are impeding by doing this. You have convinced me that null outputs are likely a tiny minor (mis-)use case.

ghost · 2019-12-05T15:10:09Z

@stain on Oct 21, 2019

I also like this late edition from @jmchilton - it is very important to state that it is first_non_null rather than first_that_ran as that would imply a temporal order. In fact this is exactly how we used to do this in Taverna 1 where we picked the value that first (in time) appeared on the input - which could give unpredictable outputs in the case where multiple upstreams ran.

We found it was was easy for users to get multiple conditionals wrong - e.g. they would have two steps, one with if a <1 and another if a>1 - and not cover a==1. Or what we saw more (because we had "fail if.." rather than "run if"), they would do if not a>1 and if not a<1 and both accidentally get triggered on a==1 - but in unpredictable order.

Add multiple inputs to evaluate and it's not easy to see the conditionals that can trigger at same time.

So the name is important - I think we want to have in pre-defined order as an important requirement here.

Now back to the proposal from @jmchilton :

If someone really wanted the ability to return null that would not trigger fallover, they can change their types to array and return [null] or null - basically the existing array optional hack.

Actually I think this is an advantage, as it means the step itself can decide to fail, even after its execution, e.g. a valueFrom can decide the value was not good enough and refuse to deliver its output (give null) instead, triggering the next non-null value to be preferred.

It does mean unnecessary execution of the additional upstream fallback steps (we have not in the spec said that an engine is free to not execute steps that do not contribute to outputs - they may be doing house keeping).

ghost · 2019-12-05T15:11:10Z

@kaushik-work on Oct 29, 2019

@jmchilton , @stain, @tetron @mr-c I've been thinking about the syntax:

source:
  first_non_null:
    - step1/out1
    - in1

And now it is stuck in my head that this operator is really of the same kind as the link_merge operator. The way the syntax is denoted here is nice, but a little arbitrary. Why is link_merge sitting outside? Also, this means source can now be a string or an object, which is not the worst thing, but a little irregular.

I'd like to go back to the earlier syntax, which was congruent with how we had link_merge. So:

source:
    - step1/out1
    - step2/out1
    - step3/out1
    - step4/out1

pickValue: all_non_null
linkMerge: merge_flattened

A future extension can have fail_filter and so on.
I feel what we are doing here is applying different operators to this list of inputs and producing another list or a scalar.

In terms of operator precedence, we have well defined rules and they are (including this proposal)

~~pickValue -> linkMerge -> default -> valueFrom~~

linkMerge -> pickValue -> default -> valueFrom (as agreed on CWL meet of 2019.11.05)

tetron · 2020-02-21T21:03:24Z

I'm going to go ahead and merge this.

Kaushik Ghose added 2 commits November 22, 2019 14:24

Add specifications for Conditionals

0812aab

Add initial set of tests

5c23d70

ghost requested review from stain, tetron and mr-c November 22, 2019 20:51

Kaushik Ghose and others added 7 commits November 25, 2019 10:12

Add basic tests for pickValue

341b6d1

This puts the basic validations and error conditons through it's paces

Further tests to examine pickValue

a2bc580

Add basic tests for scatter

563f05c

Add nested cross product scatter

8226a86

Non-boolean when values should fail

67d6cd6

Fix typo in job file name

a9ef06a

Add MultipleInputFeatureRequirement to tests that need it.

236e414

ghost marked this pull request as ready for review November 27, 2019 16:21

Kaushik Ghose added 3 commits December 4, 2019 14:50

Add test with merge_flattened and pickValue

b4bfe31

Merge branch 'conditionals' of github-sbg:common-workflow-language/cw…

7649a66

…l-v1.2 into conditionals

Add conditionals design documents

af466a1

This has been moved over from the old CWL repository. The history has been discarded.

ghost mentioned this pull request Dec 5, 2019

Write down reasoning behind conditionals decision common-workflow-language/common-workflow-language#862

Closed

jmchilton mentioned this pull request Dec 10, 2019

Even Richer Workflow Inputs galaxyproject/galaxy#9086

Merged

Kaushik Ghose added 2 commits December 17, 2019 14:20

Fix, clarify and expand first_non_null tests

b2a89fe

Fix test with conditional on non-scattered variable

2b00549

illusional mentioned this pull request Jan 23, 2020

Conditionals in Janis PMCC-BioinformaticsCore/janis-core#5

Closed

tetron added 2 commits February 20, 2020 17:14

Import conditionals tests into main tests (requires updated cwltest)

ba7549e

Work on conditionals text

a045dfc

tetron merged commit db3b07c into master Feb 21, 2020

tetron deleted the conditionals branch February 21, 2020 21:03

illusional mentioned this pull request Feb 27, 2020

Conditionals in Janis PMCC-BioinformaticsCore/janis#9

Closed

GlassOfWhiskey mentioned this pull request Apr 17, 2023

need conformance test for workflow output + linkMerge: merge_flattened common-workflow-language/common-workflow-language#795

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditionals #4

Conditionals #4

ghost commented Nov 22, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

tetron commented Feb 21, 2020

Conditionals #4

Conditionals #4

Conversation

ghost commented Nov 22, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

ghost commented Dec 5, 2019

tetron commented Feb 21, 2020