Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsync problem with NCBI #38

Closed
katewd opened this issue Aug 9, 2018 · 11 comments
Closed

rsync problem with NCBI #38

katewd opened this issue Aug 9, 2018 · 11 comments

Comments

@katewd
Copy link

katewd commented Aug 9, 2018

Similar to the issue here with Kraken v1,( DerrickWood/kraken#114 ), in Kraken2 the downloading and updating the RefSeq databases is not working. I get the following error:

kate@.../software/kraken2-master$ ./kraken2-build --standard --threads 50 --db Aug2018_RefSeq Downloading nucleotide est accession to taxon map...rsync: failed to connect to ftp.ncbi.nlm.nih.gov (165.112.9.229): Connection refused (111) rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::7): Network is unreachable (101) rsync error: error in socket IO (code 10) at clientserver.c(128) [Receiver=3.1.1]
There is a python workaround using wget instead of rsync ( https://github.com/sejmodha/MiscScripts/blob/master/UpdateKrakenDatabases.py ) but that has some issues too. Any chance you could fix this? Thanks.

@katewd
Copy link
Author

katewd commented Aug 23, 2018

Ok, so after much angst and effort to solve the problem along with my sysAdmin team (we tried opening ports, setting rsync proxies, etc), I gave up on getting it to run properly at work and decided to try on my computer at home. Amazingly, it ran first try, no problems! Still no clue where the issue is, but I can just copy the updated database onto a portable hard drive to take to work to use. There's always a work-around! Thanks for a great tool.

@tseemann
Copy link

tseemann commented Sep 6, 2018

$ kraken2-build --db template --download-taxonomy

Downloading nucleotide est accession to taxon map... done.

Downloading nucleotide gb accession to taxon map... done.

Downloading nucleotide gss accession to taxon map...rsync: failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.10): Connection refused (111)

This isn't a bug in kraken2 in my case, just NCBI is lots of hops from Australia, and also it gets overloaded or something. When i restart it, it works, then dies on the next part:

Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide gss accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map...rsync: failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.10): Connection refused (111)
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::10): Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(125) [Receiver=3.1.2]

Still quicker than FTP !

@katewd
Copy link
Author

katewd commented Sep 6, 2018

I couldn't get rsync to work through the firewalls (or whatever the problem was) at work, but the rsync steps ran fine from home. Unfortunately my laptop wasn't big enough to build the database though, so I had to copy across the downloaded folders and then run the build part (./kraken2-build --build ) on my work desktop. A few steps more than usual, but it does work. Maybe you could try that.

@DerrickWood
Copy link
Owner

I've recently added a --use-ftp option to kraken2-build so that people who can't get rsync to work due to firewall-type issues can force the downloads to use FTP instead of rsync. I can't say how long the DB info will take to download in Australia, though. I've also added a --skip-maps flag that is used in the standard install to skip downloading of the big taxonomy maps, because they aren't needed in the standard installation. That should help with download times, if nothing else.

@katewd
Copy link
Author

katewd commented Sep 11, 2018

Thank you!

@tseemann
Copy link

Thanks @DerrickWood !

@jpetteng
Copy link

jpetteng commented Mar 29, 2019

I don't see the --use-ftp as an option on kraken2 version 2.0.6 or 2.0.7. Any ideas on what's going on? Thanks.

$ kraken2 --version
Kraken version 2.0.6-beta
Copyright 2013-2018, Derrick Wood (dwood@cs.jhu.edu)

$ kraken2-build --help
Usage: kraken2-build [task option] [options]

Task options (exactly one must be selected):
--download-taxonomy Download NCBI taxonomic information
--download-library TYPE Download partial library
(TYPE = one of "archaea", "bacteria", "plasmid",
"viral", "human", "fungi", "plant", "protozoa",
"nr", "nt", "env_nr", "env_nt", "UniVec",
"UniVec_Core")
--special TYPE Download and build a special database
(TYPE = one of "greengenes", "silva", "rdp")
--add-to-library FILE Add FILE to library
--build Create DB from library
(requires taxonomy d/l'ed and at least one file
in library)
--clean Remove unneeded files from a built database
--standard Download and build default database
--help Print this message
--version Print version information

Options:
--db NAME Kraken 2 DB/library name (mandatory except for
--help/--version)
--threads # Number of threads (def: 1)
--kmer-len NUM K-mer length in bp/aa (build task only;
def: 35 nt, 15 aa)
--minimizer-len NUM Minimizer length in bp/aa (build task only;
def: 31 nt, 15 aa)
--minimizer-spaces NUM Number of characters in minimizer that are
ignored in comparisons (build task only;
def: 6 nt, 0 aa)
--protein Build a protein database for translated search
--no-masking Used with --standard/--download-library/
--add-to-library to avoid masking low-complexity
sequences prior to building; masking requires
dustmasker or segmasker to be installed in PATH,
which some users might not have.

@xingpel
Copy link

xingpel commented Jun 4, 2019

Same here:

Kraken version 2.0.7-beta
Copyright 2013-2018, Derrick Wood (dwood@cs.jhu.edu)

Kraken2-build says "Unknown option: use-ftp"

@jpetteng
Copy link

jpetteng commented Jun 4, 2019

Hello Xingpel.

Try pulling the code straight from GitHub rather than from https://ccb.jhu.edu/software/kraken2. You should then be able to invoke the use-ftp option.

-jamie

@jenniferlu717
Copy link
Collaborator

Yes this is my fault. Apologies. I have not updated the website software download. Please use the Github version

@xingpel
Copy link

xingpel commented Jun 14, 2019

Thank you all! It is well resolved,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants