Skip to content

Latest commit

 

History

History

H37Rv_PRG

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Mycobacterium tuberculosis population reference graph construction

Pipeline overview

rulegraph

Introduction

The code in this section of the repository is for construction of the "basic" H37Rv PRG.

We refer to this as "basic" because we will be using H37Rv as the skeleton, and it is known that it misses some genes found in other lineages.

The outline for the pipeline will be:

  • Split H37Rv into chunks based on the annotation.
  • Construct merged VCFs for each lineage for all VCFs in the CRyPTIC dataset (17/03/2020)
  • For a range of allele frequencies (AFs), apply variants from each VCF to their respective H37Rv chunks. By apply, we mean, make a copy of the reference sequence for that chunk, apply the variant to that copy and add it to the chunk to make a multi-sequence FASTA file.
  • Run make_prg on all chunks.
  • Combine resulting PRGs into a single PRG file for use with pandora.