-
Notifications
You must be signed in to change notification settings - Fork 31
serratus
The SARS-CoV-2 pandemic will infect millions and has already crippled the global economy.
While there is an intense research effort to sequence SARS-CoV-2 isolates to understand the evolution of the virus in real-time, our understanding of where it originated is limited by the sparse characterization of other members of the Coronaviridae family (only 53/436 CoV sp. Genomes are available).
We are re-analyzing all RNA-sequencing data in the NCBI Short Read Archive to discover new members of Coronaviridae. Our initial focus is mammalian RNA-sequencing libraries followed by avian/vertebrate, metagenomic, and finally all 1.12M entries (5.72 petabytes).
- Develop a workflow to detect CoV sequences rapidly from SRA RNA-seq
- Calculate sensitivity / specificity for detecting defined CoV and undefined CoV species in a library
- Optimize system architecture to maximize performance and allow massive scaling on AWS/NextFlow
We're currently building the framework for very high-efficiency (cost) skimming/alignment of data off of SRA. Since February, SRA has been mirrored to AWS S3, as such we can access all the data for almost no cost using AWS services.
Join the biohackathon slack channel #serratus or reach out by email