Skip to content

Latest commit

 

History

History
149 lines (98 loc) · 3.74 KB

README.md

File metadata and controls

149 lines (98 loc) · 3.74 KB

TagGD: Barcode Demultiplexing Utilities for Spatial Transcriptomics Data

License: MIT Python 3.10 Python 3.11 Python 3.12 PyPI version Build Status

TagGD is a Python-based barcode demultiplexer for Spatial Transcriptomics data. It provides a generalized, optimized, and up-to-date version of the original C++ demultiplexer "findIndexes," available here.

For the original peer-reviewed reference to the program, see PLOS ONE.

Overview

The primary goal of TagGD is to extract cDNA barcodes from input files (FASTQ, FASTA, SAM, or BAM) and match them against a list of reference barcodes using a k-mer-based approach. Matched reads are output with barcode and spatial information added to each record.

TagGD is versatile and can be used to demultiplex any type of index if a reference file is provided. Users can even create fake spatial coordinates (X, Y) for general-purpose demultiplexing tasks.

Key Features

  • Supports FASTQ, FASTA, SAM, and BAM formats.
  • Handles multiple indexes per read.
  • K-mer-based matching for efficient and accurate demultiplexing.
  • Outputs matched, unmatched, and ambiguous reads with annotated barcodes.
  • Multiple options and distance metrice.
  • Fast and memmory efficient.

Requirements

  • python 3.10 or higher
  • cython
  • pysam
  • numpy
  • dnaio
  • pytest (testing)

Installation

From Source

If you are using a virtual environment like Anaconda:

git clone https://github.com/your-repo/taggd.git
cd taggd
python setup.py build
python setup.py install

or using pip

git clone https://github.com/your-repo/taggd.git
cd taggd
pip install .

Using pip

Install directly from PyPI:

pip install taggd

Building the Project

If you are contributing, testing or making changes to the code, you may need to build or rebuild the Cython extensions:

python setup.py build_ext --inplace

Testing the Project

pytest

Usage

Basic Command

To see all available options, run:

taggd_demultiplex -h

Input Reference File Format

The reference file should contain barcodes and optional spatial coordinates, formatted as follows:

BARCODE X Y

Example:

ACGTACGT 0 0
TGCATGCA 1 1

Example Commands

Example

taggd_demultiplex   --k 6   --max-edit-distance 3   --overhang 2   --subprocesses 4   --seed randomseed   <barcodes.tsv>   <input_file>   <output_prefix>

Output

TagGD generates the following output files:

  • <output_prefix>_matched.*: Reads that matched reference barcodes.
  • <output_prefix>_unmatched.*: Reads that did not match any reference barcodes.
  • <output_prefix>_ambiguous.*: Reads that matched multiple barcodes.
  • <output_prefix>_results.tsv: Summary statistics of the run.

Options

Run taggd_demultiplex -h to view all available options and their descriptions.


Contact

For questions, bug reports, or contributions, please contact: