problem_2a

Practices 2-0 & 2-1 – ORF finding and translation (1)

[Practice 2-0]

Define function trans-codon in Perl that exchanges tri-nucleotides into an amino acid according to the codon table.

[Practice 2-1]

Write a script that loads a nucleotide sequence from a file and translate it to an amino acid sequence.

Overview for two problems

In this chapter, you are expected to achieve the following two points:

Predict the coding regions in a given sequence
Translate codons into amino acids in the predicted coding region

In Practice 2-1, you will first implement a program for translating a codon into an amino acid.

Codon and amino acid translation

Basically, one gene represents one protein in central dogma. Now, how does a gene code a protein?

A protein is a long chain of amino acids called polypeptide, and each of it is composed of 20 types of amino acid. Features of proteins are determined by the order of amino acids. Therefore, one can identify a protein by its amino acid sequence, and moreover, by the triplet nucleotides called the codons coding the amino acids. For example, nucleotide sequence of length 300 is translated into a protein with 100 amino acids.

Taken altogether, one unit of codon is equivalent to one unit of amino acid. Sequences of amino acid units, or in other words codons, compose a gene or protein.

Practice 2-0

Now, let’s translate a given DNA sequence into a protein sequence.

The first task is to define a function that uniquely translates a codon into an amino acid in Perl.

Look at the codon table in a biological textbook that best describe your target species. For example search for amino acid that corresponds with DNA sequence “ctt”; in this case, protein called leucine (Leu).

Above example only describe one set of codon. Now let’s define a subroutine that works for all 64 patterns of codon.

  sub trans_codon () {
  
  }

To call the subroutine, put an “&” in front of subroutine name as follows.

  $amino_acid=&trans_codon('ctt');

A codon “ctt” corresponds to leucine, so character “L” that represent leucine is expected to be assigned into variable $amino_acid in the above sample code. For next step, it would be nice to acquire amino acids from any DNA sequence as an argument such as an argument “atgcttctggtg” returning amino acids “MLLV”. Condon table is easily described by using hash as follows:

  my %CodonTable = (
                      'ctt', 'L',  'cct', 'P',  'cat', 'H',  'cgt', 'R',
                      'ctc', 'L',  'ccc', 'P',  'cac', 'H',  'cgc', 'R',
                                              :
                                              :
                                              :
                    );

Therefore sample code bellow

  $amino=$CodonTable{'ctt'};

easily assign amino acid from given codon composed of three sets of nucleotides.

Next is to cut sequence into pieces of three nucleotides long by using the "for" statement.

  sub trans_codon () {
        my $nucleotides = shift;  # Assign loaded sequence into variable $nucleotides
        my $amino = '';
        my %CodonTable = (
                          'ctt', 'L',  'cct', 'P',  'cat', 'H',  'cgt', 'R',
                          'ctc', 'L',  'ccc', 'P',  'cac', 'H',  'cgc', 'R',
                                                  :
                                                  :
                                                  :
                         );
  
        for (?????) {
                  ?????;  # Split a sequence into three nucleotides
                  ?????;  # Translate codons into amino acids
                  ?????;  # Join amino acid into $amino
        }
        return $amino;
  }

Practice 2-1

The translator is ready, so let’s make a script that loads a nucleotide sequence and switch codons into amino acids by the following processes.

Load target DNA sequence by the script made in the Practice 1 and assign to variable $seq
Translate $seq into amino acid by function trans_codon()
Print the result!!

Refine Practice 2-1

Here, we will provide some advanced Perl techniques to refine Practices 2-0 and 2-1.

There were few lines of "for" statement in the process of subroutine trans_codon which is highly redundant. Users can rewrite and combine these processes into a single line as follows by avoiding the use of variables.

  $amino .= $CodonTable{substr($seq, $i, 3)};

"for" statement also can be rewritten as the following.

  for(?????){ $amino .= $CodonTable{substr($seq, $i, 3)};}
  or
  $amino .= $CodonTable{substr($seq, $i, 3)} for (?????);

Related Projects

Contact

Kazuharu Arakawa, Ph.D.

G-language Project Leader Associate Professor

Institute for Advanced Biosciences Keio University

997-0017 Japan Tel/Fax: +81-235-29-0800 gaou@sfc.keio.ac.jp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly