AdapterBase

Background

Adapters are short sequences that are attached to cDNA templates during preparation of next generation sequencing (NGS) libraries. Depending on the preparation of the NGS library and how it is sequenced, the raw NGS data may be contaminated with the adapter sequences. See Didion et al. 2017 for more details.

Adapter trimming is a critical component of NGS data preprocessing. To trim adapters appropriately, it is necessary to know the sequences of the adapters that were used. However, adapter sequences are poorly documented and often are not included in the metadata of public database submissions (SRA, ENA, and DDBJ).

Target Users

AdapterBase is designed to make life easier for scientists who want to re-analyze data from the SRA. The goal is to be able to enter a run accession number (eg. SRR123456) into either the web interface or the command-line API and get out the sequences of the adapters that were used to create the library. This information will also eventually be exposed via Python bindings, so that adapter trimming programs like Atropos can access them directly.

Because a database is only as useful as the quality of the data in it, we also provide the ability for the groups doing the sequencing to create entries for their data in AdapterBase at the same time as depositing it in the SRA/ENA/DDBJ. We have begun to prepopulate the database with annotations of existing data done by automatic detection of adapters using Atropos. Similarly, we have extracted lists of kits and adapter sequences have been extracted from Illumina's documentation, and users can add data for other kits as available.

System Design

AdapterBase is implemented in SQLite3 and Django with the primary API implemented in REST. Access to the database is via web (URL TBD), command line, and/or Python bindings.

Usage

Currently, AdapterBase can be accessed from the Hackathon AWS instance by mapping port 80 back to the local host. A permanent, publically facing home will be determined later. To use the AWS instance, please see these instructions.

Local installation

If you want to spin up a local copy of AdapterBase:

Make sure python3 is installed
Clone or download this git repository
From the oadb directory, enter ./buildoadb.sh -v venv
From the oadb directory, enter ./runoadb.sh -v venv
Open a new broswer tab and navigate to http://localhost:8000

These scripts are for simplicity, you can examine them to see what they do.

Docker installation

You can build a docker image using the following incantations:

sudo docker pull ubuntu:16.04
sudo docker build -t oadb:latest .

You can start the application in the background using the following incantation:

sudo docker run -d -p 8000:8000 --name oadb oadb:latest

As before, open a new browser and navigate to http://localhost:8000

sudo docker run -d -p 8000:8000 --name oadb oadb:latest

Web interface vignettes

Get adapter sequences used by a run from the accession number

Deposit adapter information for a run

Adding new kits and/or adapter sequences

Using the Commandline API

Using Python bindings

Remaining Goals

Complete implementation of website/API
Pre-populate Run database from SRA using Atropos
Find a home for web implementation and build Docker image
User group implementation and security features

Stretch goals/post-hackathon

Continue building out run database with manual curation of SRA datasets
Integrate the AdpaterBase API into Atropos
Develop a script to scan a set of SRA accessions for adapters and match identified adapter names against the LIBRARY_CONSTRUCTION_PROTOCOL block in the SRA metadata.

Manuscript

A draft manuscript describing AdapterBase may be found here.

Project Team

AdapterBase was intitially developed as part of an NCBI-sponsored hackathon at the National Library of Medicine, August 14-16th, 2017.

John P Didion (project lead), NHGRI/NIH, john.didion@nih.gov
Dan Davis, Systems/Applications Architect, OCCS/AB, NLM, NIH, daniel.davis@nih.gov
Scott Lewis, Pulmonary Critical Care Medicine, Washington University in St. Louis, slewis3827@gmail.com
Chaim A Schramm, Vaccine Research Center, NIAID, NIH, chaim.schramm@nih.gov
Vamsi Vungutur OCCS/AB NLM, NIH vamsi.vungutur@nih.gov

Other

https://support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
assets		assets
data		data
datasources		datasources
oadb		oadb
scripts		scripts
.gitignore		.gitignore
AWSDEMO.md		AWSDEMO.md
CHANGES.md		CHANGES.md
CITATION		CITATION
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
sra_query.txt		sra_query.txt
sshconn.sh		sshconn.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdapterBase

Background

Target Users

System Design

Usage

Local installation

Docker installation

Web interface vignettes

Get adapter sequences used by a run from the accession number

Deposit adapter information for a run

Adding new kits and/or adapter sequences

Using the Commandline API

Using Python bindings

Remaining Goals

Stretch goals/post-hackathon

Manuscript

Project Team

Other

About

Releases 5

Packages

Contributors 6

Languages

License

NCBI-Hackathons/OnlineAdapterDatabase

Folders and files

Latest commit

History

Repository files navigation

AdapterBase

Background

Target Users

System Design

Usage

Local installation

Docker installation

Web interface vignettes

Get adapter sequences used by a run from the accession number

Deposit adapter information for a run

Adding new kits and/or adapter sequences

Using the Commandline API

Using Python bindings

Remaining Goals

Stretch goals/post-hackathon

Manuscript

Project Team

Other

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 6

Languages

Packages