GitHub - caballero/SeqComplex: Algorithms to compute DNA complexity

caballero / SeqComplex Public

Notifications You must be signed in to change notification settings
Fork 8
Star 34

Algorithms to compute DNA complexity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
License		License
README		README
SeqComplex.pm		SeqComplex.pm
compSeq.pl		compSeq.pl
displayStats.pl		displayStats.pl
gatherStats.pl		gatherStats.pl
profileComplexSeq.pl		profileComplexSeq.pl

Repository files navigation

== SeqComplex ==

This is a collection of methods to compute the composition and complexity of a DNA sequence(s) from a Fasta file.

The SeqComplex.pm is a Perl Module containing implementations for each complexity
measure. Additionally, several tools are provided which utilitize this module. They include:
(1) compSeq.pl compute the methods in a windowed mode.
(2) profileComplexSeq.pl compute the methods using the whole sequence.
(3) gatherStats.pl: Example script to run all methods in windowed mode and save raw data
for later processing.
(4) displayStats.pl: Example script to read in raw data from gatherStats.pl and
display as either a table or a Google Charts HTML file.

Computed methods
*gc: C+G content
*gcs: C+G skew
*cpg: CpG skew
*cwf: Complexity by Wootton & Federhen
*ce: Entropy
*cz: Complexity as compression ratio (using Gzip)
*cmN: Complexity as Markov model size of N
*ctN: Trifnov's complexity with order N
*clN: Linguistic complexity with order N

Additional methods
*ats: A+T skew
*ket: Keto skew
*pur: Purine skew

Citation
* Caballero J, Smit AFA, Hood L, Glusman G, Realistic artificial DNA sequences as negative controls for computational genomics, Nucleic Acids Research, 2014. https://doi.org/10.1093/nar/gku356

All code is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.