make the RunMELT task a little more robust #529

tedsharpe · 2023-05-09T18:15:41Z

I won't say that this is any less obscure, but at least some of the obscurity is documented.

mwalker174 · 2023-05-12T15:16:16Z

wdl/MELT.wdl

+
+    # the wait command's return code is the return code of the PrintReads command
+    # so if PrintReads fails, the script will fail
+    wait %1


This is a slight concern - could you wait on the specific PID for PrintReads instead? I'm not very familiar with shell job management but Cromwell executes these command sections by basically copy+pasting inside a larger script. If Cromwell (future versions eg Cromwell on Azure) decides to add a background process first, could that cause a problem here?

Yeah. I think that's a valid concern. I'll try to figure out a workaround.

cwhelan · 2023-05-12T15:24:06Z

Just curious -- I thought that you could stream to standard out from PrintReads so long as you suppress index creation, see for example broadinstitute/gatk#5779. What's the reason that won't work here?

mwalker174 · 2023-05-12T15:25:53Z

wdl/MELT.wdl

+    #   event that the previous command fails without opening its output file
+    # the amount of sleeping isn't critical -- it has to be long enough that PrintReads will have
+    #   opened its output file (if it's ever going to do so)
+    ( sleep 60; cat /dev/null > tmp.sam ) &


So if PrintReads fails immediately, I can see how this will work. But if PrintReads runs for >60 seconds, successfully opens the output stream, but fails afterwards, we are still going to miss the error right?

@cwhelan The problem is that PrintReads infers the output type from the filename, defaulting to BAM if it doesn't recognize the extension. That's why I named the fifo "tmp.sam" -- to tell it to write SAM (and no index).
We probably could write BAM to /dev/stdout, suppress the index, lower the compression to none, and pipe through samtools to let awk do its thing. Meh. That might be the most robust solution. I'll consider it.

@mwalker174 If PrintReads runs for more than 60 seconds without opening its output file, that will be trouble. The "harmless" command will run, samtools will try to view an empty file, and it will error out complaining that it can't read the header. However, PrintReads opens its output in onTraversalStart, so it's hard to imagine that arg parsing and opening inputs could take more than 60 seconds. Not impossible, and we'll get an error rather than a hang, but I agree it's somewhat fragile. I'll try to come up with something more robust. Maybe Chris' suggestion is best.

tedsharpe · 2023-05-12T18:04:50Z

Giving up on the fifo. It was clever and goofy at the same time. Adopting Chris' suggestion. A little extra compression and decompression, but much more straightforward.

mwalker174 · 2023-05-15T16:28:33Z

Thank you @tedsharpe, I think it's a good compromise. Did you test this on Cromwell?

tedsharpe · 2023-05-15T17:07:09Z

No. Not tested.I’ll give it a whirl.On May 15, 2023, at 12:28 PM, Mark Walker ***@***.***> wrote: Thank you @tedsharpe, I think it's a good compromise. Did you test this on Cromwell? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

tedsharpe · 2023-05-18T16:10:54Z

This has now been tested. (Sample size = 1.)

mwalker174

Thanks @tedsharpe LGTM

* Revert to CPG commit ca70123 * Absorb Broad upstream changes: * Add handling for Flag vcf fields to vcf_to_pandas (broadinstitute#506) * add outlier samples list & count outputs to PlotSVCountsPerSample (broadinstitute#510) * eliminate cram to bam conversion when possible (broadinstitute#468) * Add ref panel inputs for MakeCohortVcf subworkflows (broadinstitute#517) * Extend STR workflow to collect additional locus-level metrics (broadinstitute#516) * Change ref allele to N if unsupported in gatk formatting script (broadinstitute#511) * Add sample renaming for SD files in GatherBatchEvidence (broadinstitute#519) * Remove vcf header contig sorting in CleanVcf5 (broadinstitute#518) * Add support for building dockers for multiple registries. (broadinstitute#507) * Remove non-public images from the git-sha-based target determination (broadinstitute#525) * Fix ReadMe for build docker (broadinstitute#528) * Fix for tiny shard of IntegrateGQ in single sample pipeline (broadinstitute#524) * Fix issue in Genotype batch script with low PE_log_pval causing zero PE_count (broadinstitute#527) * Manually install MASS R package in sv-pipeline-virtual-env (broadinstitute#534) * Fix non-deterministic errors in GetSampleIdsFromVcfTar (broadinstitute#531) * Add ApplyManualVariantFilter json template (broadinstitute#536) * make the RunMELT task a little more robust (broadinstitute#529) * Improve STR workflow (broadinstitute#539) * Fix workspace format (Generate Terra workspace tsv files from transposed tables) (broadinstitute#523) * Update VaPoR workflows (broadinstitute#532) * Update README to link to SV callers used. (broadinstitute#541) Co-authored-by: Mark Walker <markw@broadinstitute.org> Co-authored-by: epiercehoffman <epierceh@broadinstitute.org>

make the RunMELT task a little more robust

412b1a0

tedsharpe assigned mwalker174 and epiercehoffman May 9, 2023

mwalker174 reviewed May 12, 2023

View reviewed changes

give up on the fifo

0e222e4

oops

be22c02

mwalker174 approved these changes May 23, 2023

View reviewed changes

mwalker174 merged commit fc7a047 into main May 23, 2023

mwalker174 deleted the tws_fixRunMELT branch May 23, 2023 16:44

gatk-sv-bot pushed a commit to Genometric/gatk-sv that referenced this pull request Jun 7, 2023

make the RunMELT task a little more robust (broadinstitute#529)

8c7f0df

gatk-sv-bot pushed a commit to Genometric/gatk-sv that referenced this pull request Jun 27, 2023

make the RunMELT task a little more robust (broadinstitute#529)

b564a0c

gatk-sv-bot pushed a commit to Genometric/gatk-sv that referenced this pull request Jun 27, 2023

make the RunMELT task a little more robust (broadinstitute#529)

dfe20c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make the RunMELT task a little more robust #529

make the RunMELT task a little more robust #529

tedsharpe commented May 9, 2023

mwalker174 May 12, 2023

tedsharpe May 12, 2023

cwhelan commented May 12, 2023

mwalker174 May 12, 2023

tedsharpe May 12, 2023

tedsharpe commented May 12, 2023

mwalker174 commented May 15, 2023

tedsharpe commented May 15, 2023 via email

tedsharpe commented May 18, 2023

mwalker174 left a comment

make the RunMELT task a little more robust #529

make the RunMELT task a little more robust #529

Conversation

tedsharpe commented May 9, 2023

mwalker174 May 12, 2023

Choose a reason for hiding this comment

tedsharpe May 12, 2023

Choose a reason for hiding this comment

cwhelan commented May 12, 2023

mwalker174 May 12, 2023

Choose a reason for hiding this comment

tedsharpe May 12, 2023

Choose a reason for hiding this comment

tedsharpe commented May 12, 2023

mwalker174 commented May 15, 2023

tedsharpe commented May 15, 2023 via email

tedsharpe commented May 18, 2023

mwalker174 left a comment

Choose a reason for hiding this comment