A small collection of tools for working with sequence files in a Python-based environment.
seq_gen
: Generate DNA sequences - either pure sequences, or fasta/fastq formatsseq_sub
: Subsample reads filestools
(misc.)- k-mers: tools for generating k-mers from a string, as well as counting non-unique k-mers
- Visualisation: plot histograms for read length and PHRED-score
- The Big 3 (+1)
pandas
numpy
matplotlib
seaborn
Biopython
- tools for loading, manipulating and exporting read datatqdm
- used for pretty progress bars (currently only in rare cases)
For convenience, a conda
environment file has been included.
$ conda env create -f environment.yml