Skip to content

SARUMAN: Using GPU programming for short read mapping

License

Notifications You must be signed in to change notification settings

jakobilab/saruman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SARUMAN

Note: this repository serves archival purposes

SARUMAN was developed in 2009 with CUDA version 4. The source code is now on GitHub, however, while I was able to compile it with recent CUDA versions (>=10), I cannot give any guaranty that the software a) works or b) produces still correct results.

Introduction

Using GPU programming for short read mapping

Since the introduction of next generation sequencing technologies like Solexa, 454, and SOLiD the amount of generated data rises with each new technology upgrade. As the application scenarios especially of the short read techniques include the re-sequencing of known genomes or sequencing of closely related strains, new software tools are needed for the fast mapping of sequencing reads against a reference genome.

Currently, there are several tools available, but most of them are limited either in speed or accuracy. Limitations in accuracy lead to non detected mappings, which could become important in post processing steps like SNP calling. Because of those limitations our goal was to develop an exact and complete mapping algorithm with equivalent running time compared to available heuristic implementations.

The result is SARUMAN (Semiglobal Alignment of Short Reads using CUDA and Needleman-Wunsch). SARUMAN uses a qgram index based filter algorithm followed by a modified Needleman-Wunsch alignment. To speed up he normally time-consuming alignment step all alignments are processed on a NVIDIA graphics card to exploit the massive parallel architecture of new graphics processing units (GPUs). Based on this technique, depending on the input read length, SARUMAN is able to process hundreds of thousands of alignments in just a few seconds. As a result of this alignment strategy SARUMAN not only detects mismatches, but also allows to detect and handle all insertions and deletions correctly. The mapping algorithm is exact and complete, it identifies all possible matching positions for a given error threshold and always returns the optimal local alignment.

Requirements

Before downloading and testing SARUMAN please make sure that your own system meets the following requirements:

  • OS: 64bit Linux system (SARUMAN was tested on Ubuntu & Gentoo)
  • Hardware: 4-8GB RAM, dual core CPU, GPU with at least 512MB for reasonable performance
  • CUDA compatible graphics card
  • CUDA capable driver for your card
  • CUDA runtime environment for proper functioning of the CUDA module
  • Two additional libraries, both available as installation package for almost all linux distributions:
    argtable2
    uthash.h
  • A working BioPerl installation for converting the SARUMAN output into SAM format

Downloads

The download package (compiled in 2011 with CUDA 4.1) contains the SARUMAN Linux binaries tested on Ubuntu and Gentoo Linux, a Perl script for converting the original SARUMAN output into SAM format and a short documentation with installation instructions and commandline options.

FTP server with sample data: ftp://ftp.cebitec.uni-bielefeld.de/pub/software/saruman/

Publication

If you use SARUMAN please cite the following publication:

Exact and complete short read alignment to microbial genomes using GPU programming Jochen Blom, Tobias Jakobi, Daniel Doppmeier, Sebastian Jaenicke, Jorn Kalinowski, Jens Stoye, and Alexander Goesmann

Bioinformatics published 30 March 2011, 10.1093/bioinformatics/btr151

Licence

Commercial users: please contact tobias@jako.bi.

About

SARUMAN: Using GPU programming for short read mapping

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published