-
Notifications
You must be signed in to change notification settings - Fork 26
Using GATB Core integrated Leon compressor
As of GATB-Core 1.4.0, the Leon compressor has been integrated into GATB-Core library.
It means that the Leon file format can now be handled natively by all softwares relying upon GATB-Core. In other words, you can apply data processing on reads without decompression of the Leon file.
Leon compressor/decompressor is available as a binary tool as soon as you have compiled the GATB-Core library. That binary is called 'leon' and it is located next to other GATB-Core tools (dbgh5, dbginfo, ...) into the 'build/bin' directory.
To compress raw DNA sequence files (Fastq and Fasta in plain text or gzipped), use Leon as follows:
leon -c -lossless -file <your-file>
To compress raw DNA sequence files using lossy mode (only applies on Fastq files), use:
leon -c -file <your-file>
As soon as Leon has finished to compress your data file, you'll see a '.h5' file next to your DNA sequence file: this is the Leon compressed file.
You can programmatically open and read the content (i.e. sequences) of a Leon '.h5' file in a very straightforward way as follows:
IBank* leonBank = Bank::open ("/path/to/leon-file.h5");
Quite simple isn't it? Then, you use the reference to IBank ('leonBank' variable) as you would do for any other kind of sequence banks (Fasta and Fastq). For instance, here is how to iterate over sequences:
Iterator<Sequence>* itLeon = leonBank->iterator();
itLeon = leonBank->iterator();
LOCAL(itLeon);
for (itLeon->first(); !itLeon->isDone(); itLeon->next()){
Sequence& seq = itLeon->item();
//to get sequence definition line, use: seq.getComment()
//to get sequence itself (nucleotides), use: seq.toString()
//to get sequence quality (Fastq only), use: seq.getQuality()
}
Instead of using Leon through its command-line tool, you can also use it programmatically as follows:
// we prepare the Leon command-line
std::vector<char*> leon_args;
std::vector<std::string> data = {
"-",
"-c",
"-file", fastqFile,
"-lossless", // <-- LOSSLESS
"-verbose","0",
"-kmer-size", "31",
"-abundance", "4"
};
for(std::vector<std::string>::iterator loop = data.begin(); loop != data.end(); ++loop){
leon_args.push_back(&(*loop)[0]);
}
// we start Leon compressor
Leon().run(leon_args.size(), &leon_args[0]);
To review various ways of using Leon programmatically, please refer to TestLeon.cpp source code file.