Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsync: failed to connect to ftp.ncbi.nlm.nih.gov #114

Closed
narsapuramvijaykumar opened this issue Mar 8, 2018 · 23 comments
Closed

rsync: failed to connect to ftp.ncbi.nlm.nih.gov #114

narsapuramvijaykumar opened this issue Mar 8, 2018 · 23 comments

Comments

@narsapuramvijaykumar
Copy link

When I was trying to execute the custom build for bacterial genome download using below command. I was encountered with below error.
Commad
kraken-build --download-library bacteria --db $DBNAME
ERROR
Step 1/3: performing rsync dry run...
Rsync dry run complete, removing any non-existent files from manifest.
Step 2/3: Performing rsync file transfer of requested files
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.12): Connection timed out (110)
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::12): Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.9]
rsync_from_ncbi.pl: rsync error, exited with code 10

Need some help or suggestions for alternative download option if available.

Thanks in advance.

Regards,
Vijay N

@your-highness
Copy link

your-highness commented Mar 9, 2018

Dear @narsapuramvijaykumar ,

Since kraken v1 rsync is used for retrieving the genome fasta files. For me the problem was our company's proxy server. Try to set the environment variable "RSYNC_PROXY=http;//<PROXY_IP>:/". However, this did not resolve the issue for me because rsync uses port 873 and our proxy did not allow connections on this port.

I ended up downloading the genome fasta files with wget followed by (unchecked):

THREADS=4

#Download taxonomy still uses wget
kraken-build --download-taxonomy --db kraken_db

#Download sequences
#Download *_genomic.fna.gz from ftp://ftp.ncbi.nlm.nih.gov/genomes/ and 

gunzip *_genomc.fna.gz
FASTAS=$(find . -name *.fna)

#Add sequences
for FASTA in ${FASTAS}
do
  kraken-build \
    --add-to-library ${FASTA} \
    --db kraken_db
done

#Build the database
kraken-build --build --threads ${THREADS} --db kraken_db
kraken-build --clean --threads ${THREADS} --db kraken_db

Centrifuge first checks if rsync is available and if not falls back to wget which is a much better approach.

Best

@narsapuramvijaykumar
Copy link
Author

I will try out as you have suggested.
Thanks a lot @your-highness

@wolfgangrumpf
Copy link

I am getting rsync issue with the bacterial and the viral libraries - archaea, human, and plasmid all downloaded fine, but with bacteria and viral I get:

rsync: failed to connect to ftp.ncbi.nlm.nih.gov: Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(124) [receiver=3.0.6]
rsync_from_ncbi.pl: rsync error, exited with code 10

Obviously it's not a port issue since the other libraries do work. I'm using Kraken version 1, since I'm working with metawrap. Any suggestions?

@cmajones
Copy link

@your-highness @DerrickWood I am getting a similar issue.

When running:
kraken2-build --download-library viral --db viral_refseq --use-ftp

I get the error:
Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: FTP connection error: Network is unreachable

Any suggestions? Thanks!

@jenniferlu717
Copy link
Collaborator

Hi all, sorry for the late reply.
@narsapuramvijaykumar @wolfgangrumpf @your-highness The newest version of kraken2 (not the latest release, but latest code in github - so github pull for the newest version) will allow you to use --use-ftp option INSTEAD of rsync. Hopefully that will work.

@cmajones try wget with a single genome from the NCBI server first to see if that works?

@cmajones
Copy link

Hi @jenniferlu717 , wget works no problem from NCBI FTP server on our cluster.

Still getting same error as above with latest Github version of kraken2 and using --use-ftp option.

@jenniferlu717
Copy link
Collaborator

Hi @jenniferlu717 , wget works no problem from NCBI FTP server on our cluster.

Still getting same error as above with latest Github version of kraken2 and using --use-ftp option.

What happens when you download without the --use-ftp option?

@cmajones
Copy link

@jenniferlu717 see below:

rsync: failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.10): Connection refused (111)
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::12): Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(127) [Receiver=3.1.3]
Error downloading assembly summary file for viral, exiting.

@cmajones
Copy link

@jenniferlu717 @DerrickWood any updates on this?

Seems like several users are having a similar issue and it has not been solved.

Thanks!

@jenniferlu717
Copy link
Collaborator

@cmajones definitely working on this. will update soon. Sorry for the delay.

@jenniferlu717
Copy link
Collaborator

@cmajones I made probably a really ugly fix that allows a kraken-build --use-wget option. Please try the newest kraken version and let me know if it breaks .....stuff.

@cmajones
Copy link

@jenniferlu717 Thanks for looking into it! I ran:

kraken-build --download-library viral --db viral_refseq --use-wget

and got a resulting folder that contains:

assembly_summary.txt library.fna manifest.txt prelim_map.txt wget_manifest.txt

When I tried to run kraken2 on it, I get error message:

kraken2: database ("/home/casey/kraken-test/viral_refseq/") does not contain necessary file taxo.k2d

@jenniferlu717
Copy link
Collaborator

After running the --download-library commands, you need to build the database using
kraken2-build --build --db . --threads 20
which will generate the taxo.k2d, opts.k2d and hash.k2d files.
Then you can run kraken2

@joshua-theisen
Copy link

I am trying to build the standard database using kraken (I can't use kraken2 because of my downstream needs). I used this code:
kraken-build --standard --threads 16 --use-wget --db standard.3

and got this error:
rsync_from_ncbi.pl: unexpected FTP path (new server?) for na

This code resulted in a directory with this structure:

ls -lhR standard.3
standard.3:
total 0
drwxrwxr-x 3 user user 4.0K Mar  5 07:29 library
drwxrwxr-x 2 user user 4.0K Mar  5 07:29 taxonomy

standard.3/library:
total 0
drwxrwxr-x 2 user user 4.0K Mar  5 07:29 archaea

standard.3/library/archaea:
total 512K
-rw-rw-r-- 1 user user 321K Mar  5 07:29 assembly_summary.txt

standard.3/taxonomy:
total 30G
-rw-rw-r-- 1 user user    0 Mar  5 07:27 accmap.dlflag
-rw-r--r-- 1 user user  18M Mar  5 07:26 citations.dmp
-rw-r--r-- 1 user user 3.9M Mar  5 07:25 delnodes.dmp
-rw-r--r-- 1 user user  452 Mar  5 07:20 division.dmp
-rw-r--r-- 1 user user  16K Mar  5 07:26 gc.prt
-rw-r--r-- 1 user user 4.9K Mar  5 07:20 gencode.dmp
-rw-r--r-- 1 user user 1.1M Mar  5 07:25 merged.dmp
-rw-r--r-- 1 user user 183M Mar  5 07:26 names.dmp
-rw-r--r-- 1 user user 146M Mar  5 07:25 nodes.dmp
-rw-rw-r-- 1 user user 9.1G Mar  5 07:26 nucl_gb.accession2taxid
-rw-rw-r-- 1 user user  20G Mar  5 07:27 nucl_wgs.accession2taxid
-rw-rw---- 1 user user 2.7K Sep 11 15:34 readme.txt
-rw-rw-r-- 1 user user    0 Mar  5 07:27 taxdump.dlflag
-rw-rw-r-- 1 user user  50M Mar  5 07:27 taxdump.tar.gz
-rw-rw-r-- 1 user user    0 Mar  5 07:29 taxdump.untarflag

Am I doing something wrong with the --use-wget switch?
Thanks

@jenniferlu717
Copy link
Collaborator

@7670367 please open a new issue in the future.

I have taken a look and seen that some of the sequences described in the assembly_summary file have "na" as their ftp path, which is causing an issue for the script. We will make a new version of the download script to fix this issue.

@joshua-theisen
Copy link

My apologies. Thanks.

@hanabarak
Copy link

Hi, I'm facing the same problem with kraken-build command. Do you have any idea when it will be fixed?
Thank you.

@SepOrion
Copy link

SepOrion commented Apr 8, 2020

Hi, I'm facing the same problem with kraken-build command. Do you have any idea when it will be fixed?
Thank you.

You can modify the rsync_from_ncbi.pl(miniconda3/envs/kraken2/libexec/rsync_from_ncbi.pl) file to solve this problem, add in the location of the picture.

if ( $full_path =~/^na/){
next
}

a

@s4251484
Copy link

s4251484 commented May 9, 2020

Hi, I'm facing the same problem with kraken-build command. Do you have any idea when it will be fixed?
Thank you.

You can modify the rsync_from_ncbi.pl(miniconda3/envs/kraken2/libexec/rsync_from_ncbi.pl) file to solve this problem, add in the location of the picture.

if ( $full_path =~/^na/){
next
}

a

works perfectly!

@jcuhpc
Copy link

jcuhpc commented Mar 18, 2022

Unfortunately, none of the alleged fixes have worked for me - using kraken2 2.1.2, stuck on bacteria.

@PHingamp
Copy link

PHingamp commented Jun 2, 2022

Same here, kraken2 2.1.2 fails to build standard viral library:

kraken2-build --use-ftp --download-library viral --db kraken2_db/refseq_viral_genomic
rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/839/185/GCF_000839185.1_ViralProj14174

I will have to custom build, but I must admit it is a shame kraken2-build doesn't take in compressed fasta files the same way kraken2 does :) Even 'just' RefSeq is cumbersome to uncompress...

@denise0593
Copy link

I have the same problem with kraken2 2.1.2 to build standard library:

./kraken2-build --standard --db standard_db
rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/762/265/GCF_000762265.1_ASM76226v1

@narsapuramvijaykumar
Copy link
Author

I'm also facing the same problem with krakenq2 2.1.2 to build bacterial library:
kraken2-build --download-library bacteria --db /home/ec2-user/databases/kraken rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/900/128/725/GCF_900128725.1_BCifornacula_v1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests