Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smaller downloadDBs #66

Closed
linsalrob opened this issue Aug 6, 2020 · 13 comments
Closed

smaller downloadDBs #66

linsalrob opened this issue Aug 6, 2020 · 13 comments

Comments

@linsalrob
Copy link
Collaborator

linsalrob commented Aug 6, 2020

I premade the databases for conda download.

Please check your diamond version with diamond --version and then read the diamond documentation to know which version to download. You can also find out the database version you have installed with diamond dbinfo.

Cluster Size diamond version 1 databases diamond version 2 databases diamond version 3 databases
90 90 v1 90 v2 90 v3
95 95 v1 95 v2 95 v3
98 98 v1 98 v2 98 v3
100 100 v1 100 v2 100 v3

After downloading, you need to copy these to lib/python3.8/site-packages/superfocus_app/db/static/diamond in the same location as superfocus:

e.g. for 90_clusters:

mkdir -p  $(which superfocus | sed -e 's#bin/superfocus$#lib/python3.8/site-packages/superfocus_app/db/static/diamond#') &&
unzip -d  $(which superfocus | sed -e 's#bin/superfocus$#lib/python3.8/site-packages/superfocus_app/db/static/diamond#') 90_clusters.db.dmnd.zip

These are smaller downloads than the raw files (db.zip is 3.3 GB)

@StefPN
Copy link

StefPN commented Aug 21, 2020

Great you did this as I am having trouble to get the DBs work. I downloaded https://edwards.sdsu.edu/SUPERFOCUS/downloads/conda/diamond_v1/98_clusters.db.dmnd.zip, however, I do get the warning

Error: Database was built with a different version of Diamond and is incompatible.
diamond v0.9.24.125 | by Benjamin Buchfink <buchfink@gmail.com>

Did I misunderstand your instructions? Which file should work with diamond v0.9.24.125? FWY, I also tried https://edwards.sdsu.edu/SUPERFOCUS/downloads/conda/diamond_v3/98_clusters.db.dmnd.zip with the same error.
I would be very glad if you can help me to get the DB running. So frustrating. I even tried to install SF with pip, conda and cloneng the git but always get error when trying to find the DB in the run. BTW, all ways I installed SF it turned out as version 0.0.0. Not sure whether this is an issue...

@metageni
Copy link
Owner

Hi @StefPN, it looks to me that the DIAMOND version you used is different from the one @linsalrob. You can always download the raw FASTA and format it yourself as it has on the tool README file.

Let me know how it goes

@StefPN
Copy link

StefPN commented Aug 21, 2020 via email

@metageni
Copy link
Owner

@StefPN Can you please try this?

#65

@StefPN
Copy link

StefPN commented Aug 21, 2020 via email

@metageni
Copy link
Owner

@StefPN I'm at work right now. I will get back to you with more details this weekend, ok?

@StefPN
Copy link

StefPN commented Aug 21, 2020 via email

@linsalrob
Copy link
Collaborator Author

According to the diamond help page,v0.9.19 to v0.9.24 produce and accept format version 2, which of course is the only version I did not include. I will attempt to install and include that version here.

@linsalrob
Copy link
Collaborator Author

@StefPN I have updated the comment here with revised locations for all the databases. Can you use the correct version for your diamond version and let us know if it solves your problem.

@StefPN
Copy link

StefPN commented Aug 24, 2020

Dear Rob,
Thank you very much! I now downloaded the 98 v2. When I start running SF, I do not get the DB error I got before. However, unfortunately I get the following error:

superfocus -q SUPERFOCUS/ -dir SUPERFOCUS/output/ -db DB_98 -a blast
[2020-08-24 08:25:01,838 - INFO] SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data
[2020-08-24 08:25:01,841 - INFO] 1.1) Working on: NC.01_R1.fastq
[2020-08-24 08:25:01,841 - INFO]    Aligning sequences in NC.01_R1.fastq to 98 using blast
BLAST query error: CFastaReader: Near line 1, there's a line that doesn't look like plausible data, but it's not marked as defline or comment.
[2020-08-24 08:25:01,938 - INFO]    Parsing Alignments
Traceback (most recent call last):
  File "/home/stefanie.prast/.local/bin/superfocus", line 11, in `<module>`
    load_entry_point('superfocus==0.0.0', 'console_scripts', 'superfocus')()
  File "/home/stefanie.prast/.local/lib/python3.7/site-packages/superfocus-0.0.0-py3.7.egg/superfocus_app/superfocus.py", line 342, in main
    del_alignments)
ValueError: not enough values to unpack (expected 2, got 0)
(base) [stefanie.prast@ctmr-nas ep]$ head SUPERFOCUS/NC.01_R1.fastq 
@M01548:130:000000000-BBN6D:1:2106:18580:4509 1:N:0:TCCGGAGA+CCTATCCT
TAAAACCGGGAAATGGACCGATGCCCGTTCTTATCTTACAAACATGGGCATTGATAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGGAGAAGCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAATGAAAAAAAAAAAGATGAGAGGCAAAAAACACAAAACATTAAAATAGAAGTGAGACATGTATAGAGAGAAGAGAGAAGAAAAGTATGAGCGGAGTAGAGACGTCAGGTGACGTAAGCTGTAGTACGATAGTTAAATTGAGTTCTAACAAGTAGAGAGAGTACTGTGA
+
B@BCCG7@@CGGFDGGCGGCG@BGE<F:<F@FF96F<@FGFDFGCGGGFGGCFFAA@C<6,CB::CDACFF@6FC7CDFCFAGEFFECCFGEEB:7=CF,4:=:CC8F<AF+8E+5,C5,CFCF<EEFCF@:++,77BF,,@ECFC***,,,,,,***,,6***,,,*,4,*,,++,,52+++++++2+5*++35959+3+/*+*+***/+*+0**++++3++**)*)**+*+***/)*)**+1+)10*)+*0*0*2)*0*(10*)*./)*-*)/*).*-))*()1))))(,(.:).-4))
@M01548:130:000000000-BBN6D:1:2106:16678:4867 1:N:0:TCCGGAGA+CCTATCCT
GATATTTTTCTCTTCAGTGATGCTGCACTGGAAAGTAACGCCGGGGAATGCTTACAACCAGCCAAGGGGATCTCGAGAATGATTCTGCCTAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGGAGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAACACACGCTACCCAACCACCCTTCTCTACGCTCTCTTTTACTTATACTGCCCTAGCCTCACACCCCCCATCTTACCTCCATTCACCTTCCTTCACCGCCCTCTCCACCACCTACATTCTACACCTCCTCA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGG,?FFGFGGGGGGFGDGGGGGG9FFGGGGGGGGG**5*>,,**4*6:,:<***6*1***5>+@+5*3*/*++5;++++++2+=+1+2*0**+*+2+*+*/)/)*)28C*+*08C5*977C*:)*)*)***)2(),),)(0)-()(((0(.).))))-)((((243.
@M01548:130:000000000-BBN6D:1:2106:28259:8664 1:N:0:TCCGGAGA+CCTATCCT
CTGGGCACCACCGGTGCAGTGAACCATACCGTGAACCTCTGGGCGGAGCTCATCGAGAATCTTCTTGATGACAGGAGCGTAGGTACGGGTAGGAGAGAGTACCAGTTCTCCGGCATTGATAGGTGAACCCTCTACCTCATCAGTCAACTTATACTTACCGCTGTAAACCAACTCCTCTGGCACGGCGTGGTCGTAGCTCTCAGGATAGTTCTCTGCGAGATACTTGGCGAATACATCGTGGCGGGCAGAAGTCAAACCGTTGCTGCCCATACCGCCGTTGTACTTCTTCTCGTAAGTAGCC

I do not see anything wrong with my fastq file. What could be the problem?
Thank you very much for your help!

@linsalrob
Copy link
Collaborator Author

linsalrob commented Aug 24, 2020 via email

@StefPN
Copy link

StefPN commented Aug 24, 2020

Of course! Sorry, I was not paying enough attention when copying the code this morning. It seems to be running now. Thank you!

@linsalrob
Copy link
Collaborator Author

linsalrob commented Aug 24, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants