Skip to content
Harshil Patel edited this page Mar 30, 2020 · 45 revisions

Communication:

Project communication for nf-core-based workflows is currently focused on Slack (you can join with this invite).

(@edawson says: I can host Zoom meetings and am happy to communicate by Telegram or Slack)

Projects:

Proposal: Cloud-based bioinformatics analysis (WDL + GCP) + accelerated pangenomic workflows (@edawson)

The pangenomics channel is working on generating assembly-based pangenomes of SARSCov2 genomes. Since we already have a reference genome (including a GFF file of ORF annotations), I thought it might be useful to build analysis pipeline(s) that can operate in parallel or downstream of the assembly pangenome.

NextStrain already does things like convert the RNA/cDNA sequences to amino acids. I was thinking we could use either their tooling or our own to produce some automatically-generated reports of variable sites on the genome / proteome. We can also provide these annotations as GFA paths to incorporate into the pangenome, facilitate read alignment to ref genome / pangenome, or filter reads against viral or host references using Kraken / rkmh.

I'm most comfortable in WDL (which runs in Broad's Terra, DNANexus via dxWDL, and using Google's Pipelines API), but we could use any of the workflow languages in reality. I think this would be a good project for folks wanting to work in shell, WDl, python, docker, and certainly R as well.

Scope-wise, it's probably best to start with a single workflow that annotates variable sites, then try to build one that aligns reads and reports whether a new strain has novel variation at these (or other) sites. Filtering workflows could be a component of this workflow.

Workflows:

connor-lab/ncov2019-artic-nf

https://github.com/connor-lab/ncov2019-artic-nf

A Nextflow pipeline that automates the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol. Supports barcoded and non-barcoded Nanopore data. Uses Nextflow DSLv2.

galaxyproject/SARS-CoV-2

https://github.com/galaxyproject/SARS-CoV-2

Initial analysis of COVID-19 data using Galaxy, BioConda and public research infrastructure (XSEDE, de.NBI-cloud, ARDC cloud). Supports Illumina and Nanopore data.

No more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics

usegalaxy.org, usegalaxy.eu, usegalaxy.org.au, usegalaxy.be and hyphy.org development teams, Anton Nekrutenko, Sergei L Kosakovsky Pond.

bioRxiv 2020.02.21.959973; doi: 10.1101/2020.02.21.959973

INSaFLU/INSaFLU

https://github.com/INSaFLU/INSaFLU

INSaFLU (“INSide the FLU”) is an influenza-oriented bioinformatics free web-based platform for an effective and timely whole-genome-sequencing-based influenza laboratory surveillance. Author states this online platform can also run for COVID-19.

INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Borges V, Pinheiro M et al.

Genome Medicine (2018) 10:46s; doi: 10.1186/s13073-018-0555-0

nf-core/viralrecon

https://github.com/nf-core/viralrecon

The following pipelines from BU-ISCIII will be ported to nf-core over the coming days:
https://github.com/BU-ISCIII/SARS_Cov2_consensus-nf
https://github.com/BU-ISCIII/SARS_Cov2_assembly-nf

Workflow for analyzing Illumina data both using amplicons and metagenomics approaches. Viral genome reconstruction and low frequency variants and annotation of both SNPs and INDELs. Uses Nextflow as DSL. Two different approaches using de novo assembly and mapping.

nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

We are also hoping to bridge these workflows into graph assembly/pangenome workflows, to support the work of other biohackathon working groups.

Project communication for nf-core-based workflows is currently focused on Slack (you can join with this invite).

Resources:

Participants:

Clone this wiki locally