Genomics library for the Scala programming language
sgxlib provides classes and methods for genomics applications. See Overview of packages and API documentation for details.
Go to latest release for source code downloads.
To build a monolithic .jar file including all dependencies, sbt is required.
sbt assembly
generates target/scala-[version]/sgxlib-[release].jar
. The jar file can be imported as a library into Java or Scala projects.
View the API docs
Package feature
The feature
package provides the concept of a genomic region with blocks and an orientation, as well as specific types of features such as transcripts and messenger RNAs. The package provides powerful feature arithmetic, with operations such as intersection, union, contains, minus, overlaps, and extraction of sub-features such as exons, introns, and sections of messenger RNAs.
Examples:
import feature._
// Construct two regions
val region1: Region = BlockSet(List(
Block("1", 1000, 2000, Plus),
Block("1", 3000, 4000, Plus),
Block("1", 5000, 6000, Plus),
Block("1", 7000, 8000, Plus)
))
val region2: Region = BlockSet(List(
Block("1", 1100, 1200, Plus),
Block("1", 1300, 1400, Plus),
Block("1", 3500, 3600, Plus)
))
// Region arithmetic
val contains: Boolean = region1.contains(region2) // true
val intersect: Region = region1.intersection(region2) // [[1:1100-1200:+], [1:1300-1400:+], [1:3500-3600:+]]
val trim: Region = region1.trim(1500, 3500) // [[1:1500-2000:+], [1:3000-3500:+]]
// Convert between genomic position and relative position with respect to region
val relpos: Option[Int] = region1.relativePos(7600) // Some(3600)
val genpos: Int = region1.chrPos(400) // 1400
// Construct a messenger RNA
val mrna: MessengerRNA = MessengerRNA(region1, 3500, 5502, Some("region1"), Some("gene1"))
// Get mRNA sub-features
val cds: Region = mrna.getCDS // [[1:3500-4000:+], [1:5000-5502:+]]
val stopCodon: Region = mrna.getStopCodon // [1:5499-5502:+]
val utr5: Option[Region] = mrna.get5UTR // Some([[1:1000-2000:+], [1:3000-3500:+]])
val introns: List[Block] = mrna.blocks.getIntrons // List([1:2000-3000:+], [1:4000-5000:+], [1:6000-7000:+])
// Note: all Region arithmetic operations are also available on Features
Package collection
The collection
package provides the concept of a collection of features, as well as methods to get the subcollection of features overlapping a given feature. An implementation is provided to create a feature collection from a GTF2.2 annotation file.
Examples:
import java.io.File
import collection.{FeatureSet, GTF22FeatureSet}
import feature._
// Load a set of features from a GTF file
val featureSet: FeatureSet[Feature] =
new GTF22FeatureSet(new File(getClass.getResource("/Homo_sapiens.GRCh38.86.chr_20_21_22.gtf").getPath))
// Get overlappers of an interval
val intervalOverlap: Iterator[Feature] = featureSet.overlappers("chr20", 37476591, 37645612, Plus) // ENST00000062104, ENST00000346199, ENST00000450588, ENST00000621390
// Get overlappers of a feature
val ENST00000613961: Feature = new Transcript(
Block("20", 37499892, 37503975, Minus),
Some("ENST00000613961"), Some("ENSG00000166619"))
val featureOverlap: Iterator[Feature] = featureSet.overlappers(ENST00000613961) // ENST00000613961, ENST00000467603
// Get nearest features to an interval
val intervalNearest: Iterator[Feature] = featureSet.nearest("chr20", 37306985, 37306986) // ENST00000373606, ENST00000397150, ENST00000397152, ENST00000373605, ENST00000397151
// Get nearest features to a feature
val block = new GenericFeature(Block("20", 37249854, 37250291, Minus), None)
val featureNearest: Iterator[Feature] = featureSet.nearest(block) // ENST00000373614
Package sequencing
The sequencing
package translates data in BAM files into the language of sgxlib, using the samtools htsjdk API under the hood. The SamMapping
class expresses BAM records as sgxlib features. The SamReader
class provides methods to query a BAM file by feature overlap. Feature overlap is defined strictly, requiring splice junctions and strand (where applicable) to match. This class also allows querying for full mapped fragments (matched mate pairs) where applicable.
Examples:
import java.io.File
import feature._
import htsjdk.samtools.SAMRecord
import sequencing.SamReader
// Construct a SamReader
val samReader: SamReader = new SamReader(new File(getClass.getResource("/DRR023752_chr20_37Mb_38Mb.bam").getPath))
// Get iterator over all records
val nRec: Int = samReader.iterator.size // 45787
// Get fragments compatible with a feature
val frag: Iterator[(SAMRecord, SAMRecord)] =
samReader.compatibleFragments(
new GenericFeature(Block("20", 37500065, 37500486, Minus), None), Plus)
/*
(DRR023752.11381138 ,DRR023752.11381138)
(DRR023752.16260396 ,DRR023752.16260396)
(DRR023752.25576103 ,DRR023752.25576103)
(DRR023752.34173869 ,DRR023752.34173869)
*/
// Get count of fragments compatible with a feature
val count: Int = samReader.countCompatibleFragments(
new GenericFeature(Block("20", 37500065, 37500486, Unstranded), None), Unstranded) // 5
/* Note: the above methods for fragments are also available for single records,
regardless of whether the library is paired end. */
Package variant
The variant
package provides functions to manipulate VCF records, using the samtools htsjdk API under the hood.
sgxlib can be used from within Java code. For example:
package examples;
import feature.*;
import java.util.Arrays;
import scala.collection.JavaConverters;
/**
* Example of using sgxlib classes from Java code
*/
public class HelloSgx {
public static void main(String[] args) {
// Create two Blocks on chr1
Block blk1 = new Block("1", 1000, 2000, Plus$.MODULE$);
Block blk2 = new Block("1", 3000, 4000, Plus$.MODULE$);
System.out.println("The Blocks are " + blk1 + " and " + blk2);
// Create a Region with the two blocks
scala.collection.immutable.List<Block> bss =
JavaConverters.asScalaBufferConverter(
Arrays.asList(blk1, blk2))
.asScala()
.toList();
BlockSet bs = new BlockSet(bss);
System.out.println("The BlockSet is " + bs);
// Create an mRNA from the Region
scala.Option<String> noName = scala.Option.apply(null);
scala.Option<String> geneName = scala.Option.apply("gene");
MessengerRNA mrna = new MessengerRNA(bs, 1500, 3400, noName, geneName);
System.out.println("The MessengerRNA is " + mrna);
// Get the CDS
Region cds = mrna.getCDS();
System.out.println("The CDS is " + cds);
}
}