STRetch_wgs_pipeline fails at Stage estimate_size #38

eugit · 2018-10-26T13:08:36Z

I just installed STRetch along with the test data in my home directory and confirmed that the
STRetch_exome_pipeline.groovy works fine. However, when I try running the STRetch_wgs_pipeline.groovy pipeline on the FASTQ files provided by issuing the following command:

cd ~/STRetch/test ../tools/bin/bpipe run ../pipelines/STRetch_wgs_pipeline.groovy *.fastq.gz

the pipeline crashes with the the error message copied and pasted below.

As far as I understand exome FASTQ files should be adequate to test the STRetch_wgs_pipeline performance. So this looks like a bug.

Thank you very much for looking into this!
E

###############################################

/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Processing 5 samples
/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
Traceback (most recent call last):
File "/home/eugit/STRetch/scripts/estimateSTR.py", line 417, in
main()
File "/home/eugit/STRetch/scripts/estimateSTR.py", line 379, in main
Y_pred = regr.predict(locus_totals['total_assigned_log'].values.reshape(-1, 1))
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 213, in predict
return self._decision_function(X)
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 196, in _decision_function
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/utils/validation.py", line 568, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
ERROR: stage estimate_size failed: Command in stage estimate_size failed with exit status = 1 :

PATH=$PATH:/home/eugit/STRetch/tools/bin; /home/eugit/STRetch/tools/bin/python /home/eugit/STRetch/scripts/estimateSTR.py --locus_counts /home/eugit/STRetch/test/11_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/1_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/49_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/54_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/69_L001_R1.STRdecoy.locus_counts --STR_counts 11_L001_R1.STRdecoy.STR_counts 1_L001_R1.STRdecoy.STR_counts 49_L001_R1.STRdecoy.STR_counts 54_L001_R1.STRdecoy.STR_counts 69_L001_R1.STRdecoy.STR_counts --median_cov 11_L001_R1.STRdecoy.median_cov 1_L001_R1.STRdecoy.median_cov 49_L001_R1.STRdecoy.median_cov 54_L001_R1.STRdecoy.median_cov 69_L001_R1.STRdecoy.median_cov --model /home/eugit/STRetch/scripts/STRcov.model.csv

========================================= Pipeline Failed ==========================================

Command in stage estimate_size failed with exit status = 1 :

PATH=$PATH:/home/eugit/STRetch/tools/bin; /home/eugit/STRetch/tools/bin/python /home/eugit/STRetch/scripts/estimateSTR.py --locus_counts /home/eugit/STRetch/test/11_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/1_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/49_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/54_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/69_L001_R1.STRdecoy.locus_counts --STR_counts 11_L001_R1.STRdecoy.STR_counts 1_L001_R1.STRdecoy.STR_counts 49_L001_R1.STRdecoy.STR_counts 54_L001_R1.STRdecoy.STR_counts 69_L001_R1.STRdecoy.STR_counts --median_cov 11_L001_R1.STRdecoy.median_cov 1_L001_R1.STRdecoy.median_cov 49_L001_R1.STRdecoy.median_cov 54_L001_R1.STRdecoy.median_cov 69_L001_R1.STRdecoy.median_cov --model /home/eugit/STRetch/scripts/STRcov.model.csv

Use 'bpipe errors' to see output from failed commands.

The text was updated successfully, but these errors were encountered:

hdashnow · 2018-10-27T02:42:54Z

Hi @eugit,

The test fastq files provided are tiny, containing reads for only one STR. The intent here is just to test that everything is installed and can run correctly. These files will only work with the specific test command provided for the exome pipeline. By running the exome pipeline with the provided bed file, it restricts the analysis to that specific region and everything works fine. You've given those tiny fastq files to the WGS pipeline. When it calculates median coverage, it gets a value of 0 because there are so few reads over the entire genome. This is then triggering nulls values in the estimate_size stage.

If you'd like run the WGS pipeline, you'll have to provide WGS data. I haven't provided a test case for this because it would be quite a large download, and take a while to run. I can think about I way to do this if you think it's necessary?

I've added a more useful error message to the estimate_size stage that detects low coverage and warns the user.

Thanks for reporting this. I'm sure this fix will help others.

Warm regards,
Harriet

hdashnow closed this as completed in ec2fd9b Oct 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STRetch_wgs_pipeline fails at Stage estimate_size #38

STRetch_wgs_pipeline fails at Stage estimate_size #38

eugit commented Oct 26, 2018 •

edited

Loading

hdashnow commented Oct 27, 2018

STRetch_wgs_pipeline fails at Stage estimate_size #38

STRetch_wgs_pipeline fails at Stage estimate_size #38

Comments

eugit commented Oct 26, 2018 • edited Loading

hdashnow commented Oct 27, 2018

eugit commented Oct 26, 2018 •

edited

Loading