Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even Richer Workflow Inputs #9086

Merged
merged 16 commits into from
Dec 17, 2019

Conversation

jmchilton
Copy link
Member

@jmchilton jmchilton commented Dec 10, 2019

Builds on #9073 (Richer Workflow Inputs).

Adds:

  • Ability to specify formats for data and collection inputs. Currently collections have no way to restrict formats - and data parameters will be restricted automatically but workflow authors may be able to provide more precise, correct data types than Galaxy can (or does) infer. Formats may be specified as a single value or a list of values in the corresponding workflow format.
  • Ability to add "restrictions" on text data inputs - so they appear as static select boxes from supplied key/values, values determined dynamically (this commit includes Implement 'select' Input Parameter in Workflow Editor #8264), or a text box with suggestions. These changes include changing the workflow editor logic so that text input can connect to select inputs.

I had wanted to use Repeats for these, but they don't work with the workflow module framework right now and it wasn't obvious how to adjust things so they would. Instead users supply these as common separated values in a text box. Since we're not persisting ideas related to the client plumbing into the database - there should be nothing to prevent us from re-working the interface to use different client form stuff in the future as long as we're on board with supporting the follow Format 2 syntax elements:

inputs:
  fasta_input:
     type: data
     format: fasta
inputs:
  sequences:
     type: data
     format:
      - fasta
      - fastqsanger
inputs:
  sequencer:
     type: text
     restrictions:
      - iseq
      - miniseq
      - miseq
      - nextseq
      - nanopore
inputs:
  sequencer:
     type: text
     restrictions:
      - value: iseq
        label: iSeq
      - value: miniseq
        label: MiniSeq
      - value: miseq
        label: MiSeq
      - value: nextseq
        label: NextSeq
      - value: nanopore
        label: Nanopore
inputs:
  sequencer:
     type: text
     restrictOnConnections: true
inputs:
  sequencer:
     type: text
     suggestions:
      - iseq
      - miniseq

@jmchilton
Copy link
Member Author

Sleeping on this I think my new plan is to wrap the restrictions parameter in a conditional with four options:

  • Do not specify restrictions (default).
  • Attempt restriction based on connections.
  • Provide list of restricted values.
  • Provide list of suggested values.

For restrict based on select connections - I've rebased #8264 into a single commit with 3 authors 98c9b50 that I can apply to this and rework.

Plan for select merging would be I think:

  • Rework parameter finding to visit_input_values right (would eliminate a lot of the code).
  • Remove the select option - just make it part of the text option.

The corresponding Format 2 syntaxes for four options would be something like:

inputs:
  sequencer:
     type: text
inputs:
  sequencer:
     type: text
     restrictions:
      - iseq
      - miniseq
inputs:
  sequencer:
     type: text
     restrictOnConnections: true
inputs:
  sequencer:
     type: text
     suggestions:
      - iseq
      - miniseq

@jmchilton
Copy link
Member Author

Well I messed up the cherry-pick because I started refactoring while I should have been merging the conflicts and committing first - so it is hard to see the lineage - but the authors of #8264 are on the commit and a good amount of the code is still there. Using visit_input_values really did reduce the amount of code needed for that.

I think otherwise the idea worked out:

Screen Shot 2019-12-10 at 11 05 03 AM

I think a lot could go wrong when generating those values so the option in the GUI is called "Attempt restriction based on connections" and this code fallbacks to a plain text input if option generation or intersection fails.

A couple more ideas:

  • If just one connection is made, just use the options - because intersection breaks the ordering.
  • Allow restriction and suggestion lists to define key value pairs. This could be rich in the Format2 source and something like : key1:A description of key2, key2:A description of key2 in the client form.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 10, 2019

This looks amazing, and yeah, visit_input_values is the right thing.

- Allow parameter inputs to be optional.
- Allow parameter inputs to have default values (if optional).
- Allow data and collection parameter to be optional.
- Unimplemented but new abstractions should allow having users specify extensions manually if they'd like for data and collection parameters.

The previous workflow Module interface would more or less persist Galaxy's "Tool State" directly into the database. The tool state framework persists state in an un-typed fashion (e.g. the optional value for data inputs would be 'True' instead of True). The Format 2 YAML that we would like to publish to Galaxy and export from Galaxy would much more cleanly represent this as actual boolean values.

The problem gets even worse with parameter inputs where we want the default value to be settable only if optional is True. This requires a conditional in the "Tool Form" language - so the persisted value would be nested structures without typing. For instance - instead of ``{default: 5, parameter_type: "integer", "optional": true}`` that we might want to store into the database or read from a workflow definition file, the framework would require something like ``{"parameter_type": "integer", "parameter_defintion": {"optional": {"optional": "True", {"specify_optional": {"specify_default": "True", "default": "5"}}}}}``.

My preference may have been to have plugins defined in the client that just produce the minimal JSON - but I didn't know how to implement that and Sam's preference was that the client use the tool form framework for everything on the side panel anyway though.

As a result, this commit adds abstractions to separate module tool state (for commiunicating with client and tied to tool form rendering) from abstract export state - loaded from the API when not coming from the tool and stored in database.

To summarize workflow module changes...

- module.get_inputs can build complex tool form parameteras (unchanged)
- module.state is always the tool form state (unchanged)
- module.get_export_state is added to get the abstract state for export and for reasoning about internally
- module.recover_state now assumes recovery from abstract step state until from_tool_form=True is supplied
- API entry points tied to the tool form (build_module) just pass through from_tool_form by default.
- The client now sends from_tool_form into the API whenever it serializes the workflow state for re-saving
@jmchilton
Copy link
Member Author

Having these constructs for building and configuring inputs and having the abstractions to separate that representation from whats gets persisted... combined with having expression tooling framework merged... we could finally build my arbitrary expression module with user described inputs and outputs...

That combined this with the conditionals proposal for CWL (common-workflow-language/cwl-v1.2#4) which is so clean, well thought through, and I think would map relatively cleanly on top of all of these new Galaxy constructs...

It isn't on the roadmap... but it is hard not to keep going 😆...

jraysajulga and others added 4 commits December 10, 2019 14:31
Rebased of dozens of commits together by jraysajulga, mvdbeek, and jjohnson.

Co-Authored-By: Jim Johnson <jj@umn.edu>
Co-Authored-By: Marius van den Beek <m.vandenbeek@gmail.com>
Co-Authored-By: John Chilton <jmchilton@gmail.com>
@jmchilton jmchilton changed the title [WIP] Even Richer Workflow Inputs Even Richer Workflow Inputs Dec 11, 2019
@galaxybot galaxybot added this to the 20.01 milestone Dec 11, 2019
collection_type = parameter_def["collection_type"]
optional = parameter_def["optional"]
collection_type_source = dict(name="collection_type", label="Collection type", type="text", value=collection_type)
collection_type_source["options"] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! There are still some rough edges (as expected), but we can fix these later IMO.
One thing we should do is disconnect optional parameter output nodes from inputs that are required, and we should have some more validation prior to launching the workflow.

lib/galaxy/tool_util/parser/factory.py Show resolved Hide resolved
lib/galaxy/workflow/modules.py Outdated Show resolved Hide resolved
@jmchilton
Copy link
Member Author

One thing we should do is disconnect optional parameter output nodes from inputs that are required

Yeah... I forgot about the editor - the declared formats are also not used. I'll give a shot at these two and see what I come up with.

we should have some more validation prior to launching the workflow

Do you have a specific thing that wasn't caught?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 11, 2019

Yeah, making a dataset optional, then not providing it. Same with an optional integer parameter connected to a required value and not provided.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 11, 2019

But I'm happy to merge without this if this looks to be a bigger effort (I could also look into that).

@jmchilton
Copy link
Member Author

First cut at the workflow editor. Handles optionals in data and collections, incorporates the format information. Handles updates to some degree. There are some corner cases I think - I'm not sure if they need to be addressed pre-merge:

  • Optional parameter types aren't validated at the connection level.
  • Switching format types or going from non-optional to optional on inputs could cause existing connections to become invalid. These aren't destroyed automatically.

@@ -370,6 +377,12 @@ var BaseInputTerminal = Terminal.extend({
`Effective output data type(s) [${cat_outputs}] do not appear to match input type(s) [${this.datatypes}].`
);
},
_producesAcceptableDatatypeAndOptionalness(other) {
if (!this.optional && !this.multiple && other.optional) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that will work if a tool author properly handles the iteration, but in my tests I used the cat tool in https://toolshed.g2.bx.psu.edu/repository?repository_id=2593fd36ae8011aa which does

#echo ' '.join(['"%s"' % $file for $file in $inputs])#

which evaluates to "None". I guess the proper way is #echo ' '.join(['"%s"' % $file for $file in $inputs if $file])# or make sure that an empty multi-data parameter behaves like [].
The tool form doesn't let you do this and insists on passing a dataset ... thought I can see this being useful.
So I guess this is fine for now and we should see if it is possible to fix this style of iteration on the Galaxy side ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool form insist on a dataset even if it is marked as optional? I thought I hacked around this years ago for Dan... ugh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the other way round, this code allows connecting optional datasets to required multi-data parameters. I think there is some merit in not allowing that, as the tool form does.

Copy link
Member Author

@jmchilton jmchilton Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - that is 100% intentional on my part. You should be able to connect two optional data parameters to a required multi-parameter - in case one of them is set, right? The tool form lets you do this - run two tools and pick the output that is produced manually and send it to the next tool 😃. This should be caught at runtime IMO so you can do conditionally things

@mvdbeek mvdbeek merged commit 9c46e85 into galaxyproject:dev Dec 17, 2019
@mvdbeek
Copy link
Member

mvdbeek commented Dec 17, 2019

Awesome, this is a big step!

when_restrict_static_suggestions.value = "staticSuggestions"
when_restrict_static_suggestions.inputs = OrderedDict()

# Repeats don't work - so use common separated list for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do repeats not work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't - I don't know why exactly. I spent a few hours trying to walk through what was happening with the parameter expansion logic and trying a variety of tweaks and was pretty convinced it wasn't a trivial amount of code away from working.

We could do some cool stuff around allowing users to declare expressions with multiple inputs and outputs if we could get that to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants