Skip to content

gcskewanalysisenglish

gaou edited this page Oct 28, 2020 · 2 revisions

Introduction

As a beginning of this series of tutorial (‘GCskew Basics’), we analyzed GC skew by using only existing G-language GAE methods. In this tutorial as a next step, we are going to analyze actual GC skew by taking advantage of G-language GAE and Perl scripts.

STEP 1 - Loading G and subroutines

Let’s start with writing Perl scripts. First create new G instance as shown in previous tutorial.

  use G;
  my $gb = load("ecoli");

There will be no need to suffer from loading genome files, parsing it, or searching for which functions to use when G-language GAE is selected. The above two-line script is it. Therefore users can focus on what they want not programming itself.

Here we are going to make script with GC skew subroutine to avoid re-build same program. The subroutine returns value of GC skew as output. Setting secondary argument as window size makes script much more resourcefull.

  sub gcskew{ 
     my $gb = shift; 
     my $window = shift; 
     my @gcskew = (); 
      
     return @gcskew; 
  } 

STEP 2 - Calculation and output of GC skews

Calculation of GC skew is very simple. Divide genome into the numbers of window and for each window calculate (C-G) / (C+G) then plot each value in the window. Let’s save the output by CSV so that graphs can be drawn with other applications.

All genome sequences are stored in $gb->{SEQ} in G instance. Other genomic information are stored in same way such as $gb->{LOCUS}, $gb->{BASE_COUNT}. Utilizing for statement is suitable for executing some processes in each window. length ($gb→{SEQ}) count up whole length of genome and $a = $sequence =~ tr/a/a/ count up a number of whole adenine in the genome.

The script should look like this:

  my @location = ();  # Store window position
  my $i = 0;  # initialize counter
  for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){ # Execute while Window overflow genome size
        my $sequence = substr($gb->{SEQ}, $i * $window, $window);  # Cut out sequences within the window
        my $c = $sequence =~ tr/c/c/;   # count up Cs
        my $g = $sequence =~ tr/g/g/;   # count up Gs
        my $skew = ($c-$g)/($c+$g);     # Calculate GC skew
        push (@location, $i * $window); # Store window position
        push (@gcskew, $skew);          # Store GC skew
  } 

Following script saves the output as 'gcskew.csv'.

  my $j = 0; 
  open(OUT, '>gcskew.csv'); 
  print OUT "location,GC skew¥n"; 
  for ($j = 0; $j <= $i; $j++){ 
       print OUT $location[$j], ",", $gcskew[$j], "¥n"; 
  } 
  close(OUT); 

STEP 3 – Generating graphs

Primary principle of G-langugae GAE is to make analysis more efficient. So G-language GAE provides a method grapher() to automatically draw a graph which save up user’s time.

  grapher(¥@x-axis, ¥@value1, ¥@value2 ...)
  option	description
  x	 x-axis label
  y	 y-axis label
  x1, x2, ...	 @value1, @value2, ...
  filename	 Filename
  title	 title

There are many more options available. For further options see reference. This is it for generating a graph.

Following script opens a graph with much more colorfully generated by gnuplot with attached gimv viewer.

STEP 4 - Done

  use strict;
  use warnings;
  use G; 
   
  my $gb = load("ecoli"); 
   
  my @gcskew = &gcskew($gb, 10000); 
   
  sub gcskew{ 
     my $gb = shift; 
     my $window = shift; 
     my @gcskew = (); 
   
     my @location = ();  
     my $i = 0;   
     for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){      
           my $sequence = substr($gb->{SEQ}, $i * $window, $window);   
           my $c = $sequence =~ tr/c/c/;  
           my $g = $sequence =~ tr/g/g/;   
           my $skew = ($c-$g)/($c+$g);    
           push (@location, $i * $window);  
           push (@gcskew, $skew);      
     } 
   
     my $j = 0; 
     open(OUT, '>gcskew.csv'); 
     print OUT "location,GC skew¥n"; 
     for ($j = 0; $j <= $i; $j++){ 
           print OUT $location[$j], ",", $gcskew[$j], "¥n"; 
     } 
     close(OUT); 
   
   
     grapher(¥@location, ¥@gcskew, -x=>'bp', -y=>'GC skew', 
                              -title=>'GC skew', -filename=>'gcskew.png'); 
   
     msg_gimv("graph/gcskew.png"); 
     return @gcskew; 
  } 
      
Clone this wiki locally