Skip to content

Submitting read data to the European Nucleotide Archive

Thomas edited this page Aug 29, 2024 · 5 revisions

The European Nucleotide Archive (ENA) is a comprehensive database of genomic sequence data. Details of the data submission process can be found in the ENA documentation. Here we provide a streamlined tutorial that should be easier to follow!

  1. Navigate to the ENA submission page and click "Submit to ENA using Webin Portal". Enter the login details for the lab's ENA account (they should be in the lab's Slack channel). Upon successful login you will see the below page.
image

  1. Register a study.

Click on "Register Study" and fill out the form. Much of this information can be amended at a later date. This is convenient if you do not know when you want your data to be made publicly available; you can just set a date two years in the future and then amend the release date when necessary. Note that on the release date, your study and any associated data will be made public. Once public, your study cannot be made private again.

image

  1. Register a sample.

The sample corresponds to the sequenced biomaterial. Click on "Register Sample" and download the checklist that most closely relates to the type of sample you are uploading. This will probably be the "default sample". The checklist will have mandatory fields that must be completed, such as the taxonomy ID of the organism from which the sample comes.

image

If you want to include additional information, toggle the optional fields (like cell type or tissue type). When you have selected the options you want, download the spreadsheet. Open the spreadsheet in Excel and fill it out according to the instructions under Step 2 & 3 of the ENA documentation.

  1. Use a command-line FTP client to transfer your read data to the ENA.

First, you need to open up the terminal and navigate to the folder where your read data is stored (if the data is on the HPC, log in). When you have found the read data, we need to generate an MD5 hash for each file that we plan on uploading. This can be done with a single command:

image

Make a note of each file's hash, as you will need this for a later step! With the hashes generated, we can now upload the data to the ENA. Type lftp webin2.ebi.ac.uk -u ENA_USERNAME, replacing ENA_USERNAME with the lab account username. Click enter, and you will be prompted to enter the account password. If you don't have lftp, you can install lftp using Brew (Mac) or Conda (Windows and Mac).

With the connection established, transfer your read data using the command mput ./FILE_NAME. File transfers can take a while, and when they have finished, you can use the ls command to confirm the transfer was successful.

  1. Submit a run.

This is the final stage and is similar to how we registered a sample. Start by clicking on "Submit Reads" and then select the file format of your read data (e.g. BAM or FASTQ).

image

Next, download the template and open it in Excel. For details of each field, refer to the "Please select fields" page (you will need to include details of the sequencing).

image

Again, follow any guidance on this page of the ENA documentation. With the spreadsheet completed and submitted, you have finished the upload process! Much of the information you have supplied through the previous steps can be edited by clicking on "Studies Report", "Sample Report" or "Runs Report", so do not worry if you have made any mistakes or typos. Also, remember to change the release date of the study if necessary.

Clone this wiki locally