Skip to content
Harshil Patel edited this page Mar 27, 2020 · 45 revisions

Communication:

(@edawson says: I can host Zoom meetings and am happy to communicate by Telegram or Slack)

Projects:

Proposal: Cloud-based bioinformatics analysis (WDL + GCP) + accelerated pangenomic workflows (@edawson)

The pangenomics channel is working on generating assembly-based pangenomes of SARSCov2 genomes. Since we already have a reference genome (including a GFF file of ORF annotations), I thought it might be useful to build analysis pipeline(s) that can operate in parallel or downstream of the assembly pangenome.

NextStrain already does things like convert the RNA/cDNA sequences to amino acids. I was thinking we could use either their tooling or our own to produce some automatically-generated reports of variable sites on the genome / proteome. We can also provide these annotations as GFA paths to incorporate into the pangenome, facilitate read alignment to ref genome / pangenome, or filter reads against viral or host references using Kraken / rkmh.

I'm most comfortable in WDL (which runs in Broad's Terra, DNANexus via dxWDL, and using Google's Pipelines API), but we could use any of the workflow languages in reality. I think this would be a good project for folks wanting to work in shell, WDl, python, docker, and certainly R as well.

Scope-wise, it's probably best to start with a single workflow that annotates variable sites, then try to build one that aligns reads and reports whether a new strain has novel variation at these (or other) sites. Filtering workflows could be a component of this workflow.

Workflows:

connor-lab/ncov2019-artic-nf

https://github.com/connor-lab/ncov2019-artic-nf

A Nextflow pipeline that automates the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol. Supports barcoded and non-barcoded Nanopore data. Uses Nextflow DSLv2.

(todo: citation?)

galaxyproject/SARS-CoV-2

https://github.com/galaxyproject/SARS-CoV-2

Initial analysis of COVID-19 data using Galaxy, BioConda and public research infrastructure (XSEDE, de.NBI-cloud, ARDC cloud). Supports Illumina and Nanopore data.

No more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics

usegalaxy.org, usegalaxy.eu, usegalaxy.org.au, usegalaxy.be and hyphy.org development teams, Anton Nekrutenko, Sergei L Kosakovsky Pond.

bioRxiv 2020.02.21.959973; doi: 10.1101/2020.02.21.959973

nf-core/covid19

https://github.com/nf-core/covid19

In discussion whether to adapt the ARTIC network or Galaxy workflows or some or all of both to a new workflow. Aspires to become part of nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Project communication for nf-core/covid19 is currently focused on Slack (you can join with this invite).

INSaFLU/INSaFLU

https://github.com/INSaFLU/INSaFLU

INSaFLU (“INSide the FLU”) is an influenza-oriented bioinformatics free web-based platform for an effective and timely whole-genome-sequencing-based influenza laboratory surveillance. Author states this online platform can also run for COVID-19.

INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Borges V, Pinheiro M et al.

Genome Medicine (2018) 10:46s; doi: 10.1186/s13073-018-0555-0

BU-ISCIII

Workflows for analyzing Illumina data both using amplicons and metagenomics approaches. Viral genome reconstruction and low frequency variants and annotation of both SNPs and INDELs. Uses Nextflow as DSL. Two different approaches using de novo assembly and mapping.

https://github.com/BU-ISCIII/SARS_Cov2_consensus-nf https://github.com/BU-ISCIII/SARS_Cov2_assembly-nf

Resources:

Participants:

  • Eric Dawson
  • Michael Heuer
  • Rutger Vos (maybe, if using nextstrain)
  • Stian Soiland-Reyes
  • Tazro Ohta
  • René Xavier (PhD candidate in Applied Genomics & Bioinformatics, limited skill set but eager to be of assistance!)
  • Sara Monzón
  • Harshil Patel
Clone this wiki locally