Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make the RunMELT task a little more robust #529

Merged
merged 3 commits into from
May 23, 2023
Merged

make the RunMELT task a little more robust #529

merged 3 commits into from
May 23, 2023

Conversation

tedsharpe
Copy link
Contributor

I won't say that this is any less obscure, but at least some of the obscurity is documented.

wdl/MELT.wdl Outdated

# the wait command's return code is the return code of the PrintReads command
# so if PrintReads fails, the script will fail
wait %1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight concern - could you wait on the specific PID for PrintReads instead? I'm not very familiar with shell job management but Cromwell executes these command sections by basically copy+pasting inside a larger script. If Cromwell (future versions eg Cromwell on Azure) decides to add a background process first, could that cause a problem here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think that's a valid concern. I'll try to figure out a workaround.

@cwhelan
Copy link
Member

cwhelan commented May 12, 2023

Just curious -- I thought that you could stream to standard out from PrintReads so long as you suppress index creation, see for example broadinstitute/gatk#5779. What's the reason that won't work here?

wdl/MELT.wdl Outdated
# event that the previous command fails without opening its output file
# the amount of sleeping isn't critical -- it has to be long enough that PrintReads will have
# opened its output file (if it's ever going to do so)
( sleep 60; cat /dev/null > tmp.sam ) &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if PrintReads fails immediately, I can see how this will work. But if PrintReads runs for >60 seconds, successfully opens the output stream, but fails afterwards, we are still going to miss the error right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhelan The problem is that PrintReads infers the output type from the filename, defaulting to BAM if it doesn't recognize the extension. That's why I named the fifo "tmp.sam" -- to tell it to write SAM (and no index).
We probably could write BAM to /dev/stdout, suppress the index, lower the compression to none, and pipe through samtools to let awk do its thing. Meh. That might be the most robust solution. I'll consider it.

@mwalker174 If PrintReads runs for more than 60 seconds without opening its output file, that will be trouble. The "harmless" command will run, samtools will try to view an empty file, and it will error out complaining that it can't read the header. However, PrintReads opens its output in onTraversalStart, so it's hard to imagine that arg parsing and opening inputs could take more than 60 seconds. Not impossible, and we'll get an error rather than a hang, but I agree it's somewhat fragile. I'll try to come up with something more robust. Maybe Chris' suggestion is best.

@tedsharpe
Copy link
Contributor Author

Giving up on the fifo. It was clever and goofy at the same time. Adopting Chris' suggestion. A little extra compression and decompression, but much more straightforward.

@mwalker174
Copy link
Collaborator

Thank you @tedsharpe, I think it's a good compromise. Did you test this on Cromwell?

@tedsharpe
Copy link
Contributor Author

tedsharpe commented May 15, 2023 via email

@tedsharpe
Copy link
Contributor Author

This has now been tested. (Sample size = 1.)

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tedsharpe LGTM

@mwalker174 mwalker174 merged commit fc7a047 into main May 23, 2023
@mwalker174 mwalker174 deleted the tws_fixRunMELT branch May 23, 2023 16:44
gatk-sv-bot pushed a commit to Genometric/gatk-sv that referenced this pull request Jun 7, 2023
gatk-sv-bot pushed a commit to Genometric/gatk-sv that referenced this pull request Jun 27, 2023
gatk-sv-bot pushed a commit to Genometric/gatk-sv that referenced this pull request Jun 27, 2023
MattWellie added a commit to populationgenomics/gatk-sv that referenced this pull request Jul 26, 2023
* Revert to CPG commit ca70123

* Absorb Broad upstream changes:

* Add handling for Flag vcf fields to vcf_to_pandas (broadinstitute#506)
* add outlier samples list & count outputs to PlotSVCountsPerSample (broadinstitute#510)
* eliminate cram to bam conversion when possible (broadinstitute#468)
* Add ref panel inputs for MakeCohortVcf subworkflows (broadinstitute#517)
* Extend STR workflow to collect additional locus-level metrics (broadinstitute#516)
* Change ref allele to N if unsupported in gatk formatting script (broadinstitute#511)
* Add sample renaming for SD files in GatherBatchEvidence (broadinstitute#519)
* Remove vcf header contig sorting in CleanVcf5 (broadinstitute#518)
* Add support for building dockers for multiple registries. (broadinstitute#507)
* Remove non-public images from the git-sha-based target determination (broadinstitute#525)
* Fix ReadMe for build docker (broadinstitute#528)
* Fix for tiny shard of IntegrateGQ in single sample pipeline (broadinstitute#524)
* Fix issue in Genotype batch script with low PE_log_pval causing zero PE_count (broadinstitute#527)
* Manually install MASS R package in sv-pipeline-virtual-env (broadinstitute#534)
* Fix non-deterministic errors in GetSampleIdsFromVcfTar (broadinstitute#531)
* Add ApplyManualVariantFilter json template (broadinstitute#536)
* make the RunMELT task a little more robust (broadinstitute#529)
* Improve STR workflow (broadinstitute#539)
* Fix workspace format (Generate Terra workspace tsv files from transposed tables) (broadinstitute#523)
* Update VaPoR workflows (broadinstitute#532)
* Update README to link to SV callers used. (broadinstitute#541)

Co-authored-by: Mark Walker <markw@broadinstitute.org>
Co-authored-by: epiercehoffman <epierceh@broadinstitute.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants