Skip to content

problem_0

gaou edited this page Nov 17, 2020 · 2 revisions

Practice 0

Cut out a piece of sequence from specific area of Mycoplasma genitalium whole genome and save it to a file.

0.1 - Genome file

First obtain a flatfile for M. genitalium genome (~0.6 Mbp).

wget http://rest.g-language.org/togoWS/NC_000908

0.2 - Cutting out specific area of genome

If the cut out areas were as follows:

  127178 - 128811,237246 - 239251

Then make two files that possess sequence 127178th base through 128811st and 237246th base through 239251st base, respectively.

An example of cutting out base from 100th base to 200th.

    1 taagttatta tttagttaat acttttaaca atattattaa ggtatttaaa aaatactatt
   61 atagtattta acatagttaa ataccttcct taatactgtt aaattatatt caatcaatac
  121 atatataata ttattaaaat acttgataag tattatttag atattagaca aatactaatt
  181 ttatattgct ttaatactta ataaatacta cttatgtatt aagtaaatat tactgtaata
  241 ctaataacaa tattattaca atatgctaga ataatattgc tagtatcaat aattactaat

Copy and paste DNA sequence form a genome file using text editor (such as emacs) and save it to a file.

  tcaatcaatacatatataatattattaaaatacttgataagtattatttagatattagacaaatactaattttatattgctttaatactta

The cut out area for students are as follow.

  First sequence     Second sequence
  11152 - 12140 , 136079 - 137367

Preview of the Practice 1

Practice 1-1 [Basic]

Write a program that counts all the number of each base from given specific area of genome.

Practice 1-2 [Basic]

Refine previous program that enables to load whole genome sequence of M. genitalium and compare the compositional difference (bias).

Practice 1-3 [Advanced]

Two-letter nucleotides are called dinucleotide and there are 16 types of nucleotides as a combination of four types of bases. In Practice 1-3, modify the previous program for dinucleotide frequency analysis. Check out the entire dinucleotide frequency pattern in the Mycoplasma genome and compare with the result for the Practice 1-2.

Clone this wiki locally