Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STRetch_wgs_pipeline fails at Stage estimate_size #38

Closed
eugit opened this issue Oct 26, 2018 · 1 comment
Closed

STRetch_wgs_pipeline fails at Stage estimate_size #38

eugit opened this issue Oct 26, 2018 · 1 comment

Comments

@eugit
Copy link

eugit commented Oct 26, 2018

I just installed STRetch along with the test data in my home directory and confirmed that the
STRetch_exome_pipeline.groovy works fine. However, when I try running the STRetch_wgs_pipeline.groovy pipeline on the FASTQ files provided by issuing the following command:

cd ~/STRetch/test ../tools/bin/bpipe run ../pipelines/STRetch_wgs_pipeline.groovy *.fastq.gz

the pipeline crashes with the the error message copied and pasted below.

As far as I understand exome FASTQ files should be adequate to test the STRetch_wgs_pipeline performance. So this looks like a bug.

Thank you very much for looking into this!
E

###############################################

/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Processing 5 samples
/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
Traceback (most recent call last):
File "/home/eugit/STRetch/scripts/estimateSTR.py", line 417, in
main()
File "/home/eugit/STRetch/scripts/estimateSTR.py", line 379, in main
Y_pred = regr.predict(locus_totals['total_assigned_log'].values.reshape(-1, 1))
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 213, in predict
return self._decision_function(X)
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 196, in _decision_function
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/utils/validation.py", line 568, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home/eugit/STRetch/tools/miniconda/envs/STR/lib/python3.6/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
ERROR: stage estimate_size failed: Command in stage estimate_size failed with exit status = 1 :

PATH=$PATH:/home/eugit/STRetch/tools/bin; /home/eugit/STRetch/tools/bin/python /home/eugit/STRetch/scripts/estimateSTR.py --locus_counts /home/eugit/STRetch/test/11_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/1_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/49_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/54_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/69_L001_R1.STRdecoy.locus_counts --STR_counts 11_L001_R1.STRdecoy.STR_counts 1_L001_R1.STRdecoy.STR_counts 49_L001_R1.STRdecoy.STR_counts 54_L001_R1.STRdecoy.STR_counts 69_L001_R1.STRdecoy.STR_counts --median_cov 11_L001_R1.STRdecoy.median_cov 1_L001_R1.STRdecoy.median_cov 49_L001_R1.STRdecoy.median_cov 54_L001_R1.STRdecoy.median_cov 69_L001_R1.STRdecoy.median_cov --model /home/eugit/STRetch/scripts/STRcov.model.csv

========================================= Pipeline Failed ==========================================

Command in stage estimate_size failed with exit status = 1 :

PATH=$PATH:/home/eugit/STRetch/tools/bin; /home/eugit/STRetch/tools/bin/python /home/eugit/STRetch/scripts/estimateSTR.py --locus_counts /home/eugit/STRetch/test/11_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/1_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/49_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/54_L001_R1.STRdecoy.locus_counts /home/eugit/STRetch/test/69_L001_R1.STRdecoy.locus_counts --STR_counts 11_L001_R1.STRdecoy.STR_counts 1_L001_R1.STRdecoy.STR_counts 49_L001_R1.STRdecoy.STR_counts 54_L001_R1.STRdecoy.STR_counts 69_L001_R1.STRdecoy.STR_counts --median_cov 11_L001_R1.STRdecoy.median_cov 1_L001_R1.STRdecoy.median_cov 49_L001_R1.STRdecoy.median_cov 54_L001_R1.STRdecoy.median_cov 69_L001_R1.STRdecoy.median_cov --model /home/eugit/STRetch/scripts/STRcov.model.csv

Use 'bpipe errors' to see output from failed commands.

@hdashnow
Copy link
Collaborator

Hi @eugit,

The test fastq files provided are tiny, containing reads for only one STR. The intent here is just to test that everything is installed and can run correctly. These files will only work with the specific test command provided for the exome pipeline. By running the exome pipeline with the provided bed file, it restricts the analysis to that specific region and everything works fine. You've given those tiny fastq files to the WGS pipeline. When it calculates median coverage, it gets a value of 0 because there are so few reads over the entire genome. This is then triggering nulls values in the estimate_size stage.

If you'd like run the WGS pipeline, you'll have to provide WGS data. I haven't provided a test case for this because it would be quite a large download, and take a while to run. I can think about I way to do this if you think it's necessary?

I've added a more useful error message to the estimate_size stage that detects low coverage and warns the user.

Thanks for reporting this. I'm sure this fix will help others.

Warm regards,
Harriet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants