A simple automated bisulfite analysis Python script to determine the methylation status of CpG sites (Predicted by MethPrimer) given sequences of control samples (.fasta files) and sequences of perturbed samples (also .fasta files). Read below for very brief background information, usage instructions and an example output.
In brief, it allows the automated analysis of the extent of methylation of cells after being exposed to a perturbation (e.g. cigarette smoke).
Can be run from terminal in Mac OS.
An example of visualisation of the data with powerpoint is provided as example_visualisation_with_powerpoint.png
Epigenetics is the reversible modification of DNA to change the amount of transcription and, hence, translation of various proteins in organisms. Gene expression in the transcription stage can be modified in the following ways:
- DNA Methylation
- Histone Modification
- Non-coding DNA
This script focuses on the analysis of the extent of DNA methylation in promoter sequences. Methylation of the promoter sequence inhibits transcription factors from binding to the DNA and represses transcriptional activity
https://www.cdc.gov/genomics/disease/epigenetics.htm
Treatment of DNA with bisulfite converts cytosine residues to uracil but leaves methylated cytosines unaffected. After direct sequencing, unmethylated cytosines are displayed in the sense strand as thymine residues. These are called CpG islands. This script compares the original sequence found in MethPrimer with control sequences and perturbed sequences.
In my experiment, RNA was extracted from control and perturbed samples, converted into cDNA, treated with bisulfite and then ligated into a TOPO vector (ThermoFisher) and subsequently E. Coli. These were grown on an agar plate containing X-gal and successfully transformed bacterial colonies were observed as white colonies instead of blue colonies. These successfully transformed bacterial colonies were picked by a pipette and sequenced.
https://en.wikipedia.org/wiki/Bisulfite_sequencing
MethPrimer is a program for designing bisulfite-conversion-based Methylation PCR Primers. Currently, it can design primers for two types of bisulfite PCR: 1) Methylation-Specific PCR (MSP) and 2) Bisulfite-Sequencing PCR (BSP) or Bisulfite-Restriction PCR. MethPrimer can also predict CpG islands in DNA sequences.
https://www.urogene.org/methprimer/
Relevant functionalities of MethPrimer are designing primers for bisulfite-sequencing PCR and prediction of CpG islands.
Ensure you have the following libraries installed:
$ pip install biopython
https://biopython.org/wiki/Download
$ pip install pandas
https://pypi.org/project/pandas/
$ pip install plotly-express
https://pypi.org/project/plotly-express/
Assuming you've already conducted bisulfite PCR and have the sequences of your control and perturbation samples perform the following steps:
- Identify the sequence you're analysing with MethPrimer and index (0 indexed) all the CG segments between your forward and reverse primers (Indicated by GC in the original stand on top)
- Identify false CpG sites (GC sequences that MethPrimer does not recognise as a CpG site, predicted CpG sites are indicated by a "++" between the top and bottom strand), note their index and modify the
bisulfite_analysis.py
as stated in the "user config" section - MethyPrimer displays 2 strands (the strand on top is the original sequence and the strand at the bottom is the bisulfite treated sequence). Visually identify a sequence that is unaffected by bisulfite sequencing after your forward primer and just before the first CpG island. Find another short sequence after your last CpG island and before your reverse primer. Enter these in the .py file as the variables "before_first_cpg" and "after_last_cpg" respectively. This allows the program to zoom in on the area of DNA with CpG sites.
- Change the total number of CpG sites (according to MethPrimer) in
bisulfite_analysis.py
- Change the number of control and perturbed samples you have in
bisulfite_analysis.py
- Place all your .fasta files of your control sequences into the
control_samples
folder - Place all your .fasta files of your perturbed sequences into the
perturbed_samples
folder
- Prepare your sequences and modify variables in the
bisulfite_analysis.py
file as shown above - Navigate to the the folder you placed
bisulfite_analysis.py
,control_samples
andperturbed_samples
with
$ cd path/to/bisulfite_analysis/
- Run the program with
$ python bisulfite_analysis.py
- For a graph of the statistics, uncomment the last block of code in the
PRINT_OUTPUT()
function in thebisulfite_analysis.py
file
Columns are
- Sample name (from file name)
- Methylation status of each CpG site, methylated sites are indicated by an "O", unmethylated sites are indicated by a "_"
- Number of sites methylated in the sequence
Control Group Methylation:
A07 : __________O ; 1
A08 : __________O ; 1
B07 : _________OO ; 2
C07 : __________O ; 1
D07 : O_________O ; 2
E07 : __________O ; 1
F07 : ___O______O ; 2
G07 : __________O ; 1
H07 : _O________O ; 2
Perturbed Group Methylation:
A12 : ______OO__O ; 3
B12 : ______OOO_O ; 4
C12 : ______O_O_O ; 3
D12 : ______OO__O ; 3
E12 : _______OO_O ; 3
Statistics:
Control Methylation Percentage: 13.131313131313133 %
Perturbed Methylation Percentage: 29.09090909090909 %
Optional bar graph of statistics can be produced by uncommenting the relevant code in the .py file.