Skip to content

aweich/VIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Vector Insertion Site – Detection Pipeline

Workflow for the detection and annotation of vector insertion sites (VIS) from long-read sequencing data
Documentation · Publication (not available yet)

Table of contents

Introduction

Welcome to the Vector Insertion Site (VIS) Detection Pipeline documentation. This Snakemake-based workflow detects and annotates insertion sites in long-read DNA sequencing data. It supports custom functions and follows the Snakemake styling guide.

The diagram below outlines the detection workflow, with key analysis steps on the left and a directed acyclic graph (DAG) of workflow components on the right. Created in BioRender. Weich, A. (2025)

For a detailed explanation, see our paper (not available yet). In brief, a (partially) known vector sequence is fragmented into kmers and searched for matches in long-read sequencing data. Matching reads are modified and mapped to a reference genome. The detection of the exact insertion site is implemented using a CIGAR-based reverse calculation and can be functionally annotated with genomic resources (e.g., genes, transcription factors). Throughout the workflow, multiple quality control steps (e.g., base quality, mapping quality) ensure integrity of input data and results.

Illustrated Core Functionality

Workflow overview


Key Features and Applications

  • Systematic Detection of Inserted Vector Sequences: The pipeline fragments the target sequence into bins, enabling precise identification of consecutive and isolated vector fragments, including their orientation in long DNA reads.

  • Accurate Localization of Insertion Sites: By mapping the detected sequences to a reference genome, the pipeline provides exact genomic coordinates of the insertion sites, ensuring high accuracy.

  • Comprehensive Annotation Capabilities: Leveraging customziable genome annotations, the pipeline determines the proximity of insertion sites to transcription factors, genes, or other biologically relevant elements, allowing for interpretations about genomic alterations caused by the VIS.

  • Customizable and Extendable Framework: Designed with flexibility in mind, the pipeline allows users to add custom analyses, annotations, or visualizations, catering to specific research needs and enhancing functionality.

  • Applicability to Biomedical Research: Particularly useful for CAR T cell therapy, the pipeline can, for instance, identify clonal insertions that could lead to therapeutic complications. In addition, it can also be used to analyze lentiviral and retroviral insertion patterns, detect guided insertions, and support other gene delivery studies.

General Usage

Everything from installation to customization of this pipeline can be found in the online documentation. If you are new to Snakemake, check out the introduction to snakemake first.

It is recommended to familiarize yourself with the workflow and its outputs before running it with your own data. A detailed practical example of the workflow and its output files with simulated data can be found in the tutorial.

Quick Start

Follow these steps to get the pipeline up and running:

Clone the Repository:

git clone https://github.com/aweich/VIS
cd VIS

Create and Activate the Environment:

mamba env create --name VIS_minimal -f workflow/envs/VIS_minimal_env.yml
conda activate VIS_minimal

Run the Pipeline:

snakemake --use-conda -n

For more detailed instructions, refer to the online documentation and the corresponding publication (not available yet).

Citation and Contribution

If you are using this pipeline or the search strategy implemented in this pipeline, please cite us and leave a star.

We encourage contributions to enhance and refine this codebase, whether through providing feedback, improving functionality, or sharing domain-specific expertise. If you have suggestions, encounter issues, or require assistance, please feel free to reach out for support or collaboration.

License

This project is available under the Apache License 2.0.