Adapters are short sequences that are attached to cDNA templates during preparation of next generation sequencing (NGS) libraries. Depending on the preparation of the NGS library and how it is sequenced, the raw NGS data may be contaminated with the adapter sequences. See Didion et al. 2017 for more details.
Adapter trimming is a critical component of NGS data preprocessing. To trim adapters appropriately, it is necessary to know the sequences of the adapters that were used. However, adapter sequences are poorly documented and often are not included in the metadata of public database submissions (SRA, ENA, and DDBJ).
AdapterBase is designed to make life easier for scientists who want to re-analyze data from the SRA. The goal is to be able to enter a run accession number (eg. SRR123456) into either the web interface or the command-line API and get out the sequences of the adapters that were used to create the library. This information will also eventually be exposed via Python bindings, so that adapter trimming programs like Atropos can access them directly.
Because a database is only as useful as the quality of the data in it, we also provide the ability for the groups doing the sequencing to create entries for their data in AdapterBase at the same time as depositing it in the SRA/ENA/DDBJ. We have begun to prepopulate the database with annotations of existing data done by automatic detection of adapters using Atropos. Similarly, we have extracted lists of kits and adapter sequences have been extracted from Illumina's documentation, and users can add data for other kits as available.
AdapterBase is implemented in SQLite3 and Django with the primary API implemented in REST. Access to the database is via web (URL TBD), command line, and/or Python bindings.
Currently, AdapterBase can be accessed from the Hackathon AWS instance by mapping port 80 back to the local host. A permanent, publically facing home will be determined later. To use the AWS instance, please see these instructions.
If you want to spin up a local copy of AdapterBase:
- Make sure python3 is installed
- Clone or download this git repository
- From the oadb directory, enter
./buildoadb.sh -v venv
- From the oadb directory, enter
./runoadb.sh -v venv
- Open a new broswer tab and navigate to
http://localhost:8000
These scripts are for simplicity, you can examine them to see what they do.
You can build a docker image using the following incantations:
sudo docker pull ubuntu:16.04
sudo docker build -t oadb:latest .
You can start the application in the background using the following incantation:
sudo docker run -d -p 8000:8000 --name oadb oadb:latest
As before, open a new browser and navigate to http://localhost:8000
sudo docker run -d -p 8000:8000 --name oadb oadb:latest
- Complete implementation of website/API
- Pre-populate Run database from SRA using Atropos
- Find a home for web implementation and build Docker image
- User group implementation and security features
- Continue building out run database with manual curation of SRA datasets
- Integrate the AdpaterBase API into Atropos
- Develop a script to scan a set of SRA accessions for adapters and match identified adapter names against the LIBRARY_CONSTRUCTION_PROTOCOL block in the SRA metadata.
A draft manuscript describing AdapterBase may be found here.
AdapterBase was intitially developed as part of an NCBI-sponsored hackathon at the National Library of Medicine, August 14-16th, 2017.
- John P Didion (project lead), NHGRI/NIH, john.didion@nih.gov
- Dan Davis, Systems/Applications Architect, OCCS/AB, NLM, NIH, daniel.davis@nih.gov
- Scott Lewis, Pulmonary Critical Care Medicine, Washington University in St. Louis, slewis3827@gmail.com
- Chaim A Schramm, Vaccine Research Center, NIAID, NIH, chaim.schramm@nih.gov
- Vamsi Vungutur OCCS/AB NLM, NIH vamsi.vungutur@nih.gov