-
Notifications
You must be signed in to change notification settings - Fork 247
HowTo: Access SRA Data
At the risk of starting this page off on a negative note, please do not download data using generic tools such as ftp, wget, etc. Doing so can create incomplete images and complicate problem diagnosis.
The supported means of downloading SRA data is to use the tool prefetch
included in the SRA Toolkit. Data may also be downloaded on demand (see our Wiki page) over HTTPS. The decision of which method to use depends upon your circumstances and in some cases the amount of data you will actually use from an SRA file.
feature | prefetch | on-demand | wget | ascp |
---|---|---|---|---|
supports Aspera | yes | no | no | yes |
supports HTTPS | yes | yes | yes | no |
partial download | no | yes | no | no |
VDB name resolution | yes | yes | no | no |
VDB cache | yes | yes | no | no |
dbGaP authorization | yes | yes | no | no |
Kart files | yes | no | no | no |
As an example of prefetch
usage:
$ prefetch SRR1482462
Maximum file size download limit is 20,971,520KB
2015-02-19T13:20:06 prefetch.2.4.4: 1) Downloading 'SRR1482462'...
2015-02-19T13:20:06 prefetch.2.4.4: Downloading via fasp...
2015-02-19T13:20:32 prefetch.2.4.4: fasp download succeed
2015-02-19T13:20:32 prefetch.2.4.4: 1) 'SRR1482462' was downloaded successfully
2015-02-19T13:20:35 prefetch.2.4.4: 'SRR1482462' has 22 dependencies
2015-02-19T13:20:36 prefetch.2.4.4: 2) Downloading 'ncbi-acc:NC_000067.5?vdb-ctx=refseq'...
2015-02-19T13:20:36 prefetch.2.4.4: Downloading via fasp...
2015-02-19T13:20:41 prefetch.2.4.4: fasp download succeed
2015-02-19T13:20:41 prefetch.2.4.4: 2) 'ncbi-acc:NC_000067.5?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:41 prefetch.2.4.4: 3) Downloading 'ncbi-acc:NC_000068.6?vdb-ctx=refseq'...
2015-02-19T13:20:41 prefetch.2.4.4: Downloading via fasp...
2015-02-19T13:20:46 prefetch.2.4.4: fasp download succeed
2015-02-19T13:20:46 prefetch.2.4.4: 3) 'ncbi-acc:NC_000068.6?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:46 prefetch.2.4.4: 4) Downloading 'ncbi-acc:NC_000069.5?vdb-ctx=refseq'...
2015-02-19T13:20:46 prefetch.2.4.4: Downloading via fasp...
2015-02-19T13:20:51 prefetch.2.4.4: fasp download succeed
2015-02-19T13:20:51 prefetch.2.4.4: 4) 'ncbi-acc:NC_000069.5?vdb-ctx=refseq' was downloaded successfully
...
As can be seen from the output above, prefetch
performs several steps:
-
check the size of the file being downloaded
If the file is very large,prefetch
must be given a higher download limit, e.g.:
$ prefetch --max-size 100000000 SRR1482462
-
download the requested file
The file is downloaded using Aspera if available on your system, or HTTPS otherwise. -
put the file into its proper place
The file is downloaded into your designated cache area. This permits VDB name resolution to work as designed. -
recursively download missing external reference sequences
Most SRA files require additional sequence files in order to reconstruct original reads.prefetch
ensures that you not only download the main file but all of its dependencies. -
access dbGaP encrypted data
prefetch
will make use of download and decryption keys that have been added to SRA Toolkit configuration to obtain authorization for the download in addition to performing all of the steps above. (N.B. In order to access dbGaP data, you will need to change directory or "cd" to the dbGaP project's workspace.)
prefetch
will also operate on existing, previously downloaded files to recursively download any missing external reference sequences.