Pynteny
is Python tool to search for synteny blocks in (prokaryotic) sequence data through HMMs of the ORFs of interest and HMMER. By leveraging genomic context information, Pynteny
can be employed to decrease the uncertainty of functional annotation of unlabelled sequence data due to the effect of paralogs. Pynteny
can be accessed (i) through the command line or (ii) as a Python module.
Get more info in the documentation pages!
Check out the Pynteny paper in the Journal of Open Source Software!
Install with conda:
- Pynteny requires Python 3.10. The easiest way to handle dependencies is by creating a dedicated conda environment:
conda create -n pynteny -c bioconda -c conda-forge python=3.10 pynteny
conda activate pynteny
- Check that installation worked fine:
(pynteny) pynteny --help
Pynteny is designed to run on Linux machines. However, it can be installed within the Windows Subsystem for Linux via conda.
Pynteny doesn't currently support the latest ARM64 architecture of silicon processors (e.g. MacBook M1 and M2). If that is your case, you can install Pynteny using the workaround below (based on this post):
CONDA_SUBDIR=osx-64 conda create -n pynteny_x86 python=3.10
conda activate pynteny_x86
conda config --env --set subdir osx-64
conda install -c bioconda pynteny
Consider the following toy example of a syntenic block:
Here, we are interested in four genes which colocate according to the pattern above: genes A-C show consecutive locations in the positive strand, followed by three (untargeted) genes and followed by gene D, which is located in the negative strand.
Pynteny can be run either as a command line tool or as a Python module. To run pynteny in the command line, execute:
conda activate pynteny
pynteny <subcommand> <options>
There are a number of available subcommands, which can be explored in the documentation pages.
For intance, to first download the PGAP's database containing a collection of profile HMMs as well as metadata:
pynteny download --outdir data/hmms --unpack
Next, to build a labelled peptide database from DNA assembly data:
pynteny build \
--data assembly.fa \
--outfile labelled_peptides.faa
Finally, to search the peptide database for the syntenic structure displayed above: >gene_A 0 >gene_B 0 >gene_C 3 <gene_D
, and using the downloaded PGAP database:
pynteny search \
--synteny_struc ">gene_A 0 >gene_B 0 >gene_C 3 <gene_D" \
--data labelled_peptides.faa \
--outdir results/ \
--gene_ids
Here are some Jupyter Notebooks with examples to show how Pynteny works:
You can find more notebooks in the examples directory. Find more info in the documentation.
Pynteny would not work without these awesome projects:
Thanks!
Contributions are always welcome! If you don't know where to start, you may find an interesting issue to work in here. Please, read our contribution guidelines first.
If you use this software, please cite it as below:
Semidán Robaina Estévez. (2023). Pynteny: synteny-aware hmm searches made easy (Version 1.0.0). Zenodo. https://zenodo.org/record/7696204