This Python script is designed for the analysis and optimization of DNA sequences for primer design, consensus sequence generation, and genome annotation visualization. It allows users to process multiple DNA sequences, identify optimal primers, generate consensus sequences, and visualize annotations on genomic sequences. This tool integrates functionalities for reading and processing genome records, optimizing primer sequences based on melting temperatures, extending primers to achieve desired Tm values, batch processing of primers, and exporting results.
- Primer Optimization: Select and optimize primers based on specific criteria such as melting temperature (Tm) and Tm differences between forward and reverse primers.
- Consensus Sequence Generation: Generate a consensus sequence from multiple DNA sequences to identify common nucleotides or patterns.
- Genome Annotation Visualization: Visualize annotations on genomic sequences, providing insights into the distribution and characteristics of genes or features.
- Batch Primer Processing: Facilitate the processing of primers in batches, applying different optimization criteria or steps for each batch.
- Exporting Results: Export optimized primers, their degeneracy scores, and genome annotations to text and Excel files for further analysis.
Before running this script, ensure you have Python 3.x installed along with the following packages:
- Biopython
- gffutils
- matplotlib
- numpy
- pandas
You can install the required packages using pip.
- Prepare your DNA sequences in FASTA format and, if available.
- Update the script's parameters to point to your input files and specify desired output locations.
- Run the script from the command line.
Initializes the GenomeMarker object with sequences and optionally a genome annotation file.
Generates a consensus sequence if multiple genome records are provided; otherwise, returns the single sequence.
Calculates a consensus sequence by analyzing the nucleotide composition at each position across all provided sequences.
Saves the consensus sequence and, if applicable, the original sequences to a FASTA file.
An example usage scenario might involve generating a consensus sequence from several aligned DNA sequences and then visualizing the distribution of specific primer sequences along the consensus.
Contributions to improve the script or add new features are welcome. Please fork the repository and submit pull requests with your proposed changes. For reporting bugs or requesting features, please open an issue through the GitHub issue tracker.
For further questions or collaborations, feel free to contact me at centenoenrique1963@gmail.com.
This script is provided under the MIT License. See the LICENSE file for more details.