Skip to content

caballero/SeqComplex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

== SeqComplex ==

This is a collection of methods to compute the composition and complexity of a DNA sequence(s) from a Fasta file.

The SeqComplex.pm is a Perl Module containing implementations for each complexity 
measure.  Additionally, several tools are provided which utilitize this module. They include:
(1) compSeq.pl compute the methods in a windowed mode.
(2) profileComplexSeq.pl compute the methods using the whole sequence.
(3) gatherStats.pl: Example script to run all methods in windowed mode and save raw data
                    for later processing.
(4) displayStats.pl: Example script to read in raw data from gatherStats.pl and 
                     display as either a table or a Google Charts HTML file.
                     
Computed methods
  *gc: C+G content     
  *gcs: C+G skew
  *cpg: CpG skew
  *cwf: Complexity by Wootton & Federhen
  *ce: Entropy
  *cz: Complexity as compression ratio (using Gzip)
  *cmN: Complexity as Markov model size of N
  *ctN: Trifnov's complexity with order N 
  *clN: Linguistic complexity with order N

Additional methods
  *ats: A+T skew
  *ket: Keto skew
  *pur: Purine skew


Citation
  * Caballero J, Smit AFA, Hood L, Glusman G, Realistic artificial DNA sequences as negative controls for computational genomics, Nucleic Acids Research, 2014. https://doi.org/10.1093/nar/gku356

Copyright (C) 2009-2015 by Juan Caballero [jcaballero@systemsbiology.org]

All code is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

About

Algorithms to compute DNA complexity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages