forked from nf-core/sarek
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request nf-core#580 from szilvajuhos/master
EACR25 abstract
- Loading branch information
Showing
1 changed file
with
59 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# 25th Biennial Congress Of The European Association For Cancer Research 2018 | ||
|
||
## Somatic and germline calls from tumour/normal whole genome data: bioinformatics workflow for reproducible research | ||
|
||
|
||
Szilveszter Juhos 1, | ||
Maxime Garcia 2, | ||
Teresita Díaz de Ståhl 3, | ||
Johanna Sandgren 3, | ||
Markus Mayrhofer 4, | ||
Max Käller 5, | ||
Björn Nystedt 6, | ||
Monica Nistér 3 | ||
|
||
1. Karolinska Institutet, Department of Oncology Pathology, Stockholm, Sweden. | ||
2. Karolinska Institutet- Science for Life Laboratory, Department of Oncology-Pathology, Stockholm, Sweden. | ||
3. Karolinska Institutet, Department of Oncology-Pathology, Stockholm, Sweden. | ||
4. Science for Life Laboratory, Uppsala University, Uppsala, Sweden. | ||
5. Science for Life Laboratory, Royal Institute of Technology- School of Biotechnology- Division of Gene Technology, Stockholm, Sweden. | ||
6. Science for Life Laboratory, Department of Cell and Molecular Biology- National Bioinformatics Infrastructure Sweden- Uppsala University, Uppsala, Sweden. | ||
|
||
### Introduction | ||
|
||
Whole-genome sequencing of cancer tumours is more a research tool nowadays, but going to be used in clinical settings in | ||
the near future to facilitate precision medicine. While large institutions have built up in-house bioinformatics | ||
solutions for their own data analysis, robust and portable workflows combining multiple software have been lacking, | ||
making it difficult for individual research groups to utilise the potential of this research field. Here we present | ||
Sarek, a robust, easy-to-install workflow for identification of both somatic and germline mutations from paired | ||
tumour/normal/relapse samples. | ||
|
||
### Material and Methods | ||
|
||
Sarek is open source and implemented in Nextflow; a domain specific programming language to enable portability and | ||
reproducibility. With the help of docker containers the versions of the underlying software can be maintained. | ||
Furthermore, with Singularity it is possible to run the workflow on protected clusters with no internet connection. | ||
|
||
The workflow starts from raw FASTQ files, and follows the GATK best practices to prepare the recalibrated files with | ||
joint realignment around indels for both the tumour and the normal data. Reads are alignment to the GRCh38 human | ||
reference in an ALT-aware settings using BWA, however, it is possible to assign other references. HaplotypeCaller and | ||
Strelka2 germline calls are collected for both the tumour and the normal sample, and Manta provides germline structural | ||
variants. The somatic variations are calculated by running MuTect2, Strelka and FreeBayes (and MuTect1 optionally). | ||
Somatic structural variants are delivered by Manta, and ASCAT estimates ploidy, tumour heterogeneity and CNVs. The | ||
resulting variant call files are annotated by SnpEff and Ensembl-VEP. The annotated calls are further filtered and | ||
prioritised by our custom methods. During running the workflow quality control metrics are also calculated and | ||
aggregated by MultiQC. | ||
|
||
### Results and Discussions | ||
|
||
Sarek was validated on a real dataset with known golden set of somatic mutations. In a real settings, whole-genome | ||
sequencing (WGS, 45-60x coverage) of patient-matched tumor and blood derived-DNA is being performed on a set of 80 | ||
pediatric brain tumor samples of the Swedish Childhood Tumor Biobank. The workflow helps to produce, filter, prioritise | ||
and characterise both germline and somatic variations. | ||
|
||
### Conclusion | ||
|
||
Sarek is a portable bioinformatics pipeline for WGS normal/tumour matched samples, aiding precision medicine by improved | ||
subtyping and to gain novel functional insights in a reproducible framework. | ||
|
||
|