Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable specifiying output directory on the CLI #469

Closed
julesjacobsen opened this issue Jan 27, 2023 · 2 comments
Closed

Enable specifiying output directory on the CLI #469

julesjacobsen opened this issue Jan 27, 2023 · 2 comments

Comments

@julesjacobsen
Copy link
Contributor

Background

The current v13.1.0 cli output options are like so

--output <string>          Path to outputOptions file. This should be
                           in JSON or YAML format.
--output-prefix <string>   Path/filename without an extension to be
                           prepended to the output file format
                           options.

The output-options.yml file contains a couple of options:

  outputPrefix: results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
  #out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML, JSON)
  outputFormats: [HTML, JSON, TSV_GENE, TSV_VARIANT, VCF]

outputPrefix will prefix the output file with the fully specified path and combine these with the outputFormats values to create the output files.

So, given the command:

java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --preset exome --output pfeiffer-output-options.yml

These files are produced:

results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.genes.tsv
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.html
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.json
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.variants.tsv
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.vcf.gz
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.vcf.gz.tbi

This is great if you want to specify a particular output filename to be used. But what if you wanted to have the default output file name (the input VCF filename with '-exomiser' appended) with a non-standard output directory? e.g.

/analysis/analysis-12345/Pfeiffer-exomiser.genes.tsv
/analysis/analysis-12345/Pfeiffer-exomiser.html
/analysis/analysis-12345/Pfeiffer-exomiser.json
/analysis/analysis-12345/Pfeiffer-exomiser.tsv
/analysis/analysis-12345/Pfeiffer-exomiser.vcf.gz
/analysis/analysis-12345/Pfeiffer-exomiser.vcf.gz.tbi

In this case you need to specify the full path and filename in the output-options.yaml file, which is irritating. Allowing users to specify the output directory would be helpful, especially for large batches of analyses.

User story

As a cli user, I wish to use to default exomiser output file name (the input VCF filename with -exomiser appended), but I want to be able to specify a custom output directory directly via the cli, without having to create an output-options.yaml file for each sample.

Option 1 - new CLI option

--output <string>          Path to outputOptions file. This should be
                           in JSON or YAML format.
--output-prefix <string>   Path/filename without an extension to be
                           prepended to the output file format
                           options.
--output-directory <path>  Path to the desired output directory
                           where exomiser will write the output files. Using this
                           without the output-file-name option will result in a default
                           filename being used which will be output to the specified
                           directory.
--output-file-name <string> Filename prefix to be used for the
                           output files. Can be combined with the 
                           output-directory option to specify a custom location
                           and filename. Used alone will result in files with the
                           specified filename being written to the default results
                           directory.
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-directory ~/exomiser-results

These files are produced:

~/exomiser-results/Pfeiffer-exomiser.html
~/exomiser-results/Pfeiffer-exomiser.json

This seems clean and simple and it would allow for adding a companion --output-filename-prefix option and neither/ either/ both options could be used. However, the existing --output-prefix option would need to be used exclusively to the new --output-directory and --output-filename-prefix options.

 --output-prefix || (--output-directory & --output-file-name)

So for a VCF input file Pfeifer.vcf.gz

java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml
 -> results/Pfeifer-exomiser.html, results/Pfeifer-exomiser.json, 
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-directory ~/exomiser-results
 -> ~/exomiser-results/Pfeifer-exomiser.html, ~/exomiser-results/Pfeifer-exomiser.json, 
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-directory ~/exomiser-results --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> ~/exomiser-results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.html, ~/exomiser-results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.json, 
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.html, results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.json, 

Illegal options:
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results  --output-directory ~/exomiser-results
 -> IllegalArgumentException
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> IllegalArgumentException
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results  --output-directory ~/exomiser-results --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> IllegalArgumentException

Implementation-wise under the hood this will be a bit of a pain as it will involve adding fields to the OutputOptions class, changing the ResultsWriterUtils and most likely ResultsWriter implementations to cater for these changes.

Option 2 - Use existing CLI --output-prefix option

Given there is already an --output-prefix option, this could be trivially changed so that Exomiser parses the value (a String)
as either a file path (current behaviour) or as a directory (new behaviour).

e.g.

java -jar exomiser-cli-13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results

produces

~/exomiser-results.html
~/exomiser-results.json

but appending the system file separator to the --output-prefix argument indicates this is to be interpreted as a directory:

java -jar exomiser-cli-13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results/

produces

~/exomiser-results/Pfeiffer-exomiser.html
~/exomiser-results/Pfeiffer-exomiser.json

So the value of --output-prefix is now a directory path and the file names are generated from the input VCF file name as before. Implementation is a simple change to the ResultsWriterUtils class to better-specify the behaviour of the way the output-prefix argument is interpreted.

Pros and Cons

Option 1 is more explicit and probably (?) less likely to cause confusion, will require API changes and an additional set of commands for the CLI, whereas option 2 is simpler to implement and requires no API changes or CLI changes, at the expense of some possible confusion about the meaning of output-prefix which can do double-duty.

@yaseminbridges, have you got any preference?

@yaseminbridges
Copy link

No preferences for me, I feel like both options are clear enough in how to use the feature especially if it is documented as it is here. So whatever you feel is a good fit I am happy with, I would be able to work with both approaches!

julesjacobsen added a commit that referenced this issue Feb 3, 2023
Add changes for issue #469 allowing --output-directory and --output-file-name CLI options
@julesjacobsen
Copy link
Contributor Author

Went with the split option of outputDirectory and outputFileName

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants