You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The name field of each input is used in the Snakemake workflow in the origin wildcard. This wildcard includes a leading underscore (by design) such that the example input about would produce files like results/filtered_input1.fasta where the wildcard is _input1.
MissingInputException in line 363 of /Users/jlhudd/projects/nextstrain/ncov/workflow/snakemake_rules/main_workflow.smk:
Missing input files for rule index_sequences:
results/filtered_getting_started.fasta
Expected behavior
Users should be able to define whatever names they like for their input data and have these names be processed by the workflow without any errors.
How to reproduce
Copy and paste the example inputs entry above into my_profiles/getting_started/builds.yaml.
Possible solution
Instead of placing the origin wildcard inside the name of each associated file, use the origin wildcard as a subdirectory in results/ (or even data/ might make more sense). Additionally, drop support for the deprecated config["sequences"] and config["metadata"] interface and require users to define at least one entry in the config's inputs.
These changes will allow us to know that we always have an "origin" defined (instead of supporting the optional empty origin) and they will also make the wildcard's values more flexible because they can contain any reasonable values that a directory name can have. This approach will also have the benefit of cleanly organizing files into subdirectories by input, allowing users to discover and inspect these files more easily.
cc: @jameshadfield for comments on this proposed solution. I'm happy to implement this.
The text was updated successfully, but these errors were encountered:
Removes deprecated sequence and metadata inputs from the configuration
file and removes Snakemake logic required to support these files. Also,
removes references to this deprecated input format from the example
profiles and the "multiple inputs" tutorial.
Since we no longer support this old input format, we also no longer need
to support empty origin wildcards. We drop support for empty origin
wildcard and remove all references to trimming of origin wildcards
that start with an underscore and update all rules to reference the origin
wildcard with the underscore in the filename.
We also now print helpful errors when inputs aren't defined properly
through checks for configurations with old-style input definitions or
without any inputs defined. These error messages provide recommendations
about how to update the workflow configuration to fix the issues.
Fixes#616
Current Behavior
To define inputs for the workflow, we create a
builds.yaml
file that contains a list of named inputs in a format like this:The
name
field of each input is used in the Snakemake workflow in theorigin
wildcard. This wildcard includes a leading underscore (by design) such that the example input about would produce files likeresults/filtered_input1.fasta
where the wildcard is_input1
.The constraints on the format of this "origin" wildcard do not allow for underscores in the input names. For example, the following reasonable input definition:
produces the following unintelligible error:
Expected behavior
Users should be able to define whatever names they like for their input data and have these names be processed by the workflow without any errors.
How to reproduce
Copy and paste the example
inputs
entry above intomy_profiles/getting_started/builds.yaml
.Possible solution
Instead of placing the
origin
wildcard inside the name of each associated file, use theorigin
wildcard as a subdirectory inresults/
(or evendata/
might make more sense). Additionally, drop support for the deprecatedconfig["sequences"]
andconfig["metadata"]
interface and require users to define at least one entry in the config'sinputs
.These changes will allow us to know that we always have an "origin" defined (instead of supporting the optional empty origin) and they will also make the wildcard's values more flexible because they can contain any reasonable values that a directory name can have. This approach will also have the benefit of cleanly organizing files into subdirectories by input, allowing users to discover and inspect these files more easily.
cc: @jameshadfield for comments on this proposed solution. I'm happy to implement this.
The text was updated successfully, but these errors were encountered: