A project to sequence the RNA/DNA from actual error prone reads from a PCR machine and then to analyze the so sequenced DNA
DNA or deoxyribonucleic acid is a long molecule that contains our unique genetic code. Like a recipe book it holds the instructions for making all the proteins in our bodies Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (About this soundlisten);[1] DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life
DNA Analysis is the process of determining an individual's DNA characteristics
Here are someof the analysis results.
The DNA has quality values define using Phred 33 encoding. These values pertain to the confidence score of the DNA reads, i.e. how likely are these values to be true The graph reflectig them visually is a excellent way of demarcating bad quality readsfrom good quality reads. Additionally, once determined, if the reads are of what category, the kind of sequencing to be used n them can be determined
The Dna has 4 bases:
- Adenine
- Thyamie
- Guanine
- Cytosine
Finding and visualizing errors and qualities in these four bases separately will reveal interesting patterns.
One nice obsrvaton from the four quality graphs is that these grahs seem similar, hence proving that the original PCR process used to get the DNA read is a reliable, repeatable and reproducible process
Another observation is that even that they seem similar, they are actually different. Their respective points are different, thereby having overall differet average error line
The Adenine is a base in DNA
The Thyamine is a base in DNA
The Guanine is a base in DNA
The Cytosine is a base in DNA
The comparison of GC content between Actual DNA and Calculated DNA. This comparison shows 2 things:
- The sequenced DNA line becomes almost straight, proving that the DNA so sequenced is astable DNA. Hence the sequencing pocess worked fine, and is a trustworthy proces
- The Actual and Found DNA lines coincide as the DNAprogresses, proving that the sequenced DNA has a good level of genome coverage
The DNA BInary Signal Analysis, is a binary signal plot of KMP algorithm output for different length of subsequences from Actual DNA checked against Found DNA This type of analysis will be used to findthe ocation of genomes, similarity between Actual DNA and Found DNA and the changes of pattern matching on increase on the subsequence length
The DNA frame Buffer
The DNA frame Buffer