GitHub - aweich/VIS: Snakemake workflow for the detection of vector insertion sites (VIS) in long-read sequencing data

Vector Insertion Site – Detection Pipeline

Workflow for the detection and annotation of vector insertion sites (VIS) from long-read sequencing data
Documentation · Publication (not available yet)

Introduction

Welcome to the Vector Insertion Site (VIS) Detection Pipeline documentation. This Snakemake-based workflow detects and annotates insertion sites in long-read DNA sequencing data. It supports custom functions and follows the Snakemake styling guide.

The diagram below outlines the detection workflow, with key analysis steps on the left and a directed acyclic graph (DAG) of workflow components on the right. Created in BioRender. Weich, A. (2025)

For a detailed explanation, see our paper (not available yet). In brief, a (partially) known vector sequence is fragmented into kmers and searched for matches in long-read sequencing data. Matching reads are modified and mapped to a reference genome. The detection of the exact insertion site is implemented using a CIGAR-based reverse calculation and can be functionally annotated with genomic resources (e.g., genes, transcription factors). Throughout the workflow, multiple quality control steps (e.g., base quality, mapping quality) ensure integrity of input data and results.

Illustrated Core Functionality

Key Features and Applications

Systematic Detection of Inserted Vector Sequences: The pipeline fragments the target sequence into bins, enabling precise identification of consecutive and isolated vector fragments, including their orientation in long DNA reads.
Accurate Localization of Insertion Sites: By mapping the detected sequences to a reference genome, the pipeline provides exact genomic coordinates of the insertion sites, ensuring high accuracy.
Comprehensive Annotation Capabilities: Leveraging customziable genome annotations, the pipeline determines the proximity of insertion sites to transcription factors, genes, or other biologically relevant elements, allowing for interpretations about genomic alterations caused by the VIS.
Customizable and Extendable Framework: Designed with flexibility in mind, the pipeline allows users to add custom analyses, annotations, or visualizations, catering to specific research needs and enhancing functionality.
Applicability to Biomedical Research: Particularly useful for CAR T cell therapy, the pipeline can, for instance, identify clonal insertions that could lead to therapeutic complications. In addition, it can also be used to analyze lentiviral and retroviral insertion patterns, detect guided insertions, and support other gene delivery studies.

General Usage

Everything from installation to customization of this pipeline can be found in the online documentation. If you are new to Snakemake, check out the introduction to snakemake first.

It is recommended to familiarize yourself with the workflow and its outputs before running it with your own data. A detailed practical example of the workflow and its output files with simulated data can be found in the tutorial.

Quick Start

Follow these steps to get the pipeline up and running:

Clone the Repository:

git clone https://github.com/aweich/VIS
cd VIS

Create and Activate the Environment:

mamba env create --name VIS_minimal -f workflow/envs/VIS_minimal_env.yml
conda activate VIS_minimal

Run the Pipeline:

snakemake --use-conda -n

For more detailed instructions, refer to the online documentation and the corresponding publication (not available yet).

Citation and Contribution

If you are using this pipeline or the search strategy implemented in this pipeline, please cite us and leave a star.

We encourage contributions to enhance and refine this codebase, whether through providing feedback, improving functionality, or sharing domain-specific expertise. If you have suggestions, encounter issues, or require assistance, please feel free to reach out for support or collaboration.

License

This project is available under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.github/workflows		.github/workflows
config		config
docs		docs
tutorial		tutorial
workflow		workflow
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vector Insertion Site – Detection Pipeline

Table of contents

Introduction

Illustrated Core Functionality

Key Features and Applications

General Usage

Quick Start

Citation and Contribution

License

About

Uh oh!

Releases 1

Languages

License

aweich/VIS

Folders and files

Latest commit

History

Repository files navigation

Vector Insertion Site – Detection Pipeline

Table of contents

Introduction

Illustrated Core Functionality

Key Features and Applications

General Usage

Quick Start

Citation and Contribution

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages