Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added wrapper for verifybamid2. #401

Merged
merged 5 commits into from
Jan 17, 2022
Merged

Conversation

brcopeland
Copy link
Contributor

@brcopeland brcopeland commented Sep 9, 2021

Description

This provides a wrapper for the tool verifybamid2 which can be used to estimate intra-species contamination in a BAM file.

QC

For all wrappers added by this PR, I made sure that

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.

# optional - this can be used to specify custom resource files if
# necessary (if using GRCh37 or GRCh38 instead simply specify
# genome_build="38", for example
svd_prefix="ref.vcf",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an optional input file then, not a parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the svd_prefix that is passed in to verifybamid2 is not actually a file that exists, so that's why I didn't make it an input file. I could change it to have as an input one of the files that do exist, strip off the file suffix, and pass that to verifybamid2 if you would prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, would it be possible to specify all needed files under that prefix explicitly? E.g. using multiext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed params.svd_prefix to input.svd_mu. This file name would be the same as the other with the suffix .mu appended. verifybamid2 requires other files with suffixes .bed, .UD, and .V to exist so the wrapper checks that.

Please note that the wrapper looks explicitly for input.svd_mu and not for the others as far as specified arguments go. I just thought that approach would be simpler to work with.

@johanneskoester
Copy link
Contributor

Thanks a lot, nice work! Just one comment above.

@brcopeland brcopeland force-pushed the master branch 3 times, most recently from e0215fd to 102751a Compare November 9, 2021 19:05
@johanneskoester johanneskoester merged commit 1117186 into snakemake:master Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants