A tutorial for the protein-disorder module of BioJava
|
BioJava provide a module biojava-protein-disorder for prediction disordered regions from a protein sequence. Biojava-protein-disorder module for now contains one method for the prediction of disordered regions. This method is based on the Java implementation of RONN predictor.
This code has been originally developed for use with JABAWS. We call this code JRONN. JRONN is based on the C implementation of RONN algorithm and uses the same model data, therefore gives the same predictions. JRONN based on RONN version 3.1 which is still current in time of writing (August 2011). Main motivation behind JRONN development was providing an implementation of RONN more suitable to use by the automated analysis pipelines and web services. Robert Esnouf has kindly allowed us to explore the RONN code and share the results with the community.
Original version of RONN is described in Yang,Z.R., Thomson,R., McMeil,P. and Esnouf,R.M. (2005) RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 21: 3369-3376
Examples of use are provided below. For more information please refer to JronnExample testcases.
Finally instead of an API calls you can use a command line utility, which is likely to give you a better performance as it uses multiple threads to perform calculations.
FastaSequence fsequence = new FastaSequence("name",
"LLRGRHLMNGTMIMRPWNFLNDHHFPKFFPHLIEQQAIWLADWWRKKHC" +
"RPLPTRAPTMDQWDHFALIQKHWTANLWFLTFPFNDKWGWIWFLKDWTPGSADQAQRACTWFFCHGHDTN");
float[] rawProbabilityScores = Jronn.getDisorderScores(fsequence);
Example 2: Calculate the probability of disorder for every residue in the sequence for all proteins from the FASTA input file
final List<FastaSequence> sequences = SequenceUtil.readFasta(new FileInputStream("src/test/resources/fasta.in"));
Map<FastaSequence, float[]> rawProbabilityScores = Jronn.getDisorderScores(sequences);
FastaSequence fsequence = new FastaSequence("Prot1", "LLRGRHLMNGTMIMRPWNFLNDHHFPKFFPHLIEQQAIWLADWWRKKHC" +
"RPLPTRAPTMDQWDHFALIQKHWTANLWFLTFPFNDKWGWIWFLKDWTPGSADQAQRACTWFFCHGHDTN" +
"CQIIFEGRNAPERADPMWTGGLNKHIIARGHFFQSNKFHFLERKFCEMAEIERPNFTCRTLDCQKFPWDDP");
Range[] ranges = Jronn.getDisorder(fsequence);
final List<FastaSequence> sequences = SequenceUtil.readFasta(new FileInputStream("src/test/resources/fasta.in"));
Map<FastaSequence, Range[]> ranges = Jronn.getDisorder(sequences);
The content of this tutorial is available under the CC-BY license, available here.
BioJava 5: A community driven open-source bioinformatics library
Aleix Lafita, Spencer Bliven, Andreas Prlić, Dmytro Guzenko, Peter W. Rose, Anthony Bradley, Paolo Pavan, Douglas Myers-Turnbull, Yana Valasatava, Michael Heuer, Matt Larson, Stephen K. Burley, & Jose M. Duarte
PLOS Computational Biology (2019) 15 (2):e1006791.
Navigation: Home | Book 3: The Protein Structure modules
Prev: Book 4: The Genomics Module | Next: Book 6: The ModFinder Module