Assignment Date: Monday, Sept 23, 2019
Due Date: Monday, Sept. 30, 2019 @ 11:59pm
In this assignment you will consider the requirements for sequence alignment. As a reminder, any questions about the assignment should be posted to Piazza
Determine how many bases long a given pattern P should be to ensure that occurrences of P are unlikely to be chance events (e<.000001) in genomes of the following sizes:
- 1a. 5.2Mb (Bacillus anthracis – the microbe that causes anthrax)
- 1b. 100Mb (Caenorhabditis elegans - model worm)
- 1c. 3.1Gb (Homo sapiens - human)
- 1d. 18GB (Triticum aestivum – bread wheat)
- 1e. 670Gb (Polychaos dubium – amoeba, has largest known genome)
Compute the edit distance of (a portion of) the human hemoglobin alpha and beta subunits, showing the dynamic programming matrix and the aligned sequences. Assume a fixed unit cost to substitute one amino acid for another.
Alpha: EALERMFLSFPTTKTYFPHFDLSHGSAQVK
Beta: EALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVK
The solutions to the above questions should be submitted as a single PDF document that includes your name, email address, and all relevant figures (as needed). Make sure to clearly label each of the subproblems and give the exact commands and/or code snippets you used for solving the question. You do not need to show code for plotting. Submit your solutions by uploading the PDF to GradeScope. The Entry Code is: MPK8BX
If you submit after this time, you will start to use up your late days. Remember, you are only allowed 5 late days (120 hours) for the entire semester!