Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BED feature #45

Open
wants to merge 49 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
f7c9d7d
Added a CLI option '--bed' to specify a BED file with regions to polish.
isovic Jun 21, 2020
abc739d
Now storing the start position of every window in the Window class.
isovic Jun 22, 2020
011de11
Added a operator<< to Window for debugging purposes.
isovic Jun 22, 2020
4eccb0b
The polisher can now fill in the gap in between windows if they are f…
isovic Jun 22, 2020
ffb68b4
Minor refactor in src/polisher.cpp - extracted the window creation co…
isovic Jun 22, 2020
5cd83d8
Added the suffix portion of the target sequence after the last window.
isovic Jun 22, 2020
6efcfc8
Implemented a simple BED parser and added unit tests.
isovic Jun 23, 2020
0923930
The createPolisher function now accepts a BED file path and parses it.
isovic Jun 23, 2020
a7645c9
Added the interval tree library.
isovic Jun 23, 2020
599f56e
Minor refactor, extracted the function transmuteId from overlap.cpp t…
isovic Jun 23, 2020
e252d93
Now constructing the interval trees for the given BED records, and va…
isovic Jun 23, 2020
5386bef
Added t_begin() and t_end() to Overlap.
isovic Jun 23, 2020
c1fabf7
Implemented overlap filtering when BED is specified, to remove overla…
isovic Jun 23, 2020
4537c7e
Now storing the intervals in the Polisher too, and sorting them right…
isovic Jun 23, 2020
001d3a5
Added the end_ coordinate to the Window class.
isovic Jun 25, 2020
a16068e
Minor fix, should be squashed with previous.
isovic Jun 25, 2020
6e556fc
Added BED coordinate sanity check to src/bed.cpp.
isovic Jun 25, 2020
f094ef3
Added a verbose of the window end coordinate in src/window.cpp.
isovic Jun 25, 2020
1eeeac4
Working on the new Polisher::create_and_populate_windows_with_bed. Ri…
isovic Jun 25, 2020
7e6585b
Renamed the target_interva_s and target_trees_ to target_bed_interval…
isovic Jun 25, 2020
b0aa724
Expanded the Overlap::breaking_points_ to be a tuple and store the wi…
isovic Jun 25, 2020
a3d8975
Added src/util.cpp and moved the IntervalTree typedefs to util.hpp. A…
isovic Jun 25, 2020
662d3f7
Removed the intervaltree/interval_tree_test.cpp from vendor/meson.bui…
isovic Jun 25, 2020
42700b9
Implemented a generic window splitting function generate_window_break…
isovic Jun 25, 2020
9653b73
Added unit tests for generate_window_breakpoints from util.* to test/…
isovic Jun 25, 2020
5d4f542
The Overlap::find_breaking_points_from_cigar now uses the utility fun…
isovic Jun 25, 2020
0753523
Added overloaded versions of the Overlap::find_breaking_points_from_c…
isovic Jun 25, 2020
b221d49
Implemented the BED-selected polishing in polisher.*.
isovic Jun 25, 2020
1b264b3
Now writing the unpolished sequences with no BED windows.
isovic Jun 26, 2020
d49722f
Updated CMakeLists.txt to compile the new code and tests.
isovic Jun 26, 2020
33359e1
Added missing include to src/bed.hpp.
isovic Jun 26, 2020
8d69226
Updated the initializer list in test/bed_test.cpp.
isovic Jun 26, 2020
bc79bc2
Updated the initializer list in test/util_test.cpp.
isovic Jun 26, 2020
c8a6bd2
Updated the test initializer lists in test/bed_test.cpp.
isovic Jun 26, 2020
1c91dd1
Updated the initializer lists in test/util_test.cpp.
isovic Jun 26, 2020
c499f5d
Updated the initializer lists in src/bed.cpp.
isovic Jun 26, 2020
c6f1507
Updated the initializer lists in src/util.cpp
isovic Jun 26, 2020
4778adf
Fixed the operator<< for WindowInterval in src/util.hpp.
isovic Jun 26, 2020
bd6fc43
Added operator<< to Overlap in src/overlap.hpp.
isovic Jun 26, 2020
4012f39
Removed the legacy windowing code in src/polisher.cpp, it was buggy i…
isovic Jun 26, 2020
14eb18a
Removed the deprecated code from ../src/overlap.hpp and ../src/overla…
isovic Jun 26, 2020
b52d1bf
Trying to fix an off-by-1 error.
isovic Jun 26, 2020
ea576bc
Reverted the off-by-1 in src/polisher.cpp.
isovic Jun 26, 2020
99f35a3
Fixed a bug in src/polisher.cpp window creation when BED is not used …
isovic Jun 26, 2020
8e5c802
Minor change in logging.
isovic Jun 26, 2020
f633ad0
Fixed the include for the IntervalTree library in CMakeLists.txt.
isovic Jun 30, 2020
766b6fa
Minor bugfix with the windowing - the window start offset was still b…
isovic Jul 13, 2020
8a1f6bc
Minor cleanup in src/polisher.cpp.
isovic Sep 8, 2020
aef75a9
Version v1.5.0.
isovic Sep 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,7 @@
[submodule "vendor/GenomeWorks"]
path = vendor/GenomeWorks
url = https://github.com/clara-parabricks/GenomeWorks.git
[submodule "vendor/intervaltree"]
path = vendor/intervaltree
url = https://github.com/ekg/intervaltree.git
branch = master
9 changes: 8 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.2)
project(racon)
set(racon_version 1.4.17)
set(racon_version 1.5.0)

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib)
Expand All @@ -27,13 +27,16 @@ if(racon_enable_cuda)
endif()

include_directories(${PROJECT_SOURCE_DIR}/src)
include_directories(${PROJECT_SOURCE_DIR}/vendor/intervaltree)

set(racon_sources
src/main.cpp
src/bed.cpp
src/logger.cpp
src/polisher.cpp
src/overlap.cpp
src/sequence.cpp
src/util.cpp
src/window.cpp)

if(racon_enable_cuda)
Expand Down Expand Up @@ -103,11 +106,15 @@ if (racon_build_tests)
include_directories(${PROJECT_SOURCE_DIR}/src)

set(racon_test_sources
test/bed_test.cpp
test/racon_test.cpp
test/util_test.cpp
src/bed.cpp
src/logger.cpp
src/polisher.cpp
src/overlap.cpp
src/sequence.cpp
src/util.cpp
src/window.cpp)

if (racon_enable_cuda)
Expand Down
2 changes: 1 addition & 1 deletion meson.build
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
project(
'Racon',
'cpp',
version : '1.4.13',
version : '1.5.0',
default_options : [
'buildtype=release',
'warning_level=3',
Expand Down
75 changes: 75 additions & 0 deletions src/bed.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/*!
* @file bed.cpp
*
* @brief BED reader source file
*/

#include <iostream>
#include <memory>
#include <sstream>

#include "bed.hpp"

namespace racon {

bool BedFile::Deserialize(const std::string& line, BedRecord& record) {
if (line.empty()) {
return false;
}
char name_buff[1024];
int64_t chrom_start = 0, chrom_end = 0;
int32_t n = sscanf(line.c_str(), "%s %ld %ld", name_buff, &chrom_start, &chrom_end);
if (n < 3 || chrom_end <= chrom_start) {
throw std::runtime_error("Invalid BED line: '" + line + "'");
}
record = BedRecord(name_buff, chrom_start, chrom_end);
return true;
}

void BedFile::Serialize(std::ostream& os, const BedRecord& record) {
os << record.chrom() << " " << record.chrom_start() << " " << record.chrom_end();
}

std::string BedFile::Serialize(const BedRecord& record) {
std::ostringstream oss;
Serialize(oss, record);
return oss.str();
}

BedReader::BedReader(const std::string& in_fn)
: file_{std::unique_ptr<std::ifstream>(new std::ifstream(in_fn))}
, in_(*file_.get())
{
}

BedReader::BedReader(std::istream& in)
: in_(in)
{
}

bool BedReader::GetNext(BedRecord& record) {
const bool rv1 = !std::getline(in_, line_).fail();
if (!rv1)
return false;
const bool rv2 = BedFile::Deserialize(line_, record);
return rv2;
}

std::vector<BedRecord> BedReader::ReadAll(const std::string& fn)
{
std::ifstream in{fn};
return ReadAll(in);
}

std::vector<BedRecord> BedReader::ReadAll(std::istream& in)
{
std::vector<BedRecord> records;
BedReader reader{in};
BedRecord record;
while (reader.GetNext(record)) {
records.emplace_back(std::move(record));
}
return records;
}

}
93 changes: 93 additions & 0 deletions src/bed.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/*!
* @file bed.hpp
*
* @brief BED file containers and parser.
*/

#pragma once

#include <cstdint>
#include <fstream>
#include <memory>
#include <string>
#include <vector>

namespace racon {

class BedRecord;

class BedFile {
public:
static bool Deserialize(const std::string& line, BedRecord& record);
static void Serialize(std::ostream& os, const BedRecord& record);
static std::string Serialize(const BedRecord& record);
};

/*
* \brief BedRecord container.
* Note: BED records have 0-based coordinates, and the end coordinate is non-inclusive.
*/
class BedRecord {
public:
~BedRecord() = default;

BedRecord() = default;

BedRecord(std::string _chrom, int64_t _chrom_start, int64_t _chrom_end)
: chrom_(std::move(_chrom))
, chrom_start_(_chrom_start)
, chrom_end_(_chrom_end) {}

const std::string& chrom() const {
return chrom_;
}
int64_t chrom_start() const {
return chrom_start_;
}
int64_t chrom_end() const {
return chrom_end_;
}

void chrom(const std::string& val) {
chrom_ = val;
}
void chrom_start(int64_t val) {
chrom_start_ = val;
}
void chrom_end(int64_t val) {
chrom_end_ = val;
}

bool operator==(const BedRecord& rhs) const
{
return chrom_ == rhs.chrom_ && chrom_start_ == rhs.chrom_start_
&& chrom_end_ == rhs.chrom_end_;
}

std::ostream& operator<<(std::ostream& os) const {
BedFile::Serialize(os, *this);
return os;
}

private:
std::string chrom_;
int64_t chrom_start_ = 0;
int64_t chrom_end_ = 0;
};

class BedReader {
public:
BedReader(const std::string& in_fn);
BedReader(std::istream& in);

static std::vector<BedRecord> ReadAll(const std::string& fn);
static std::vector<BedRecord> ReadAll(std::istream& in);
bool GetNext(BedRecord& record);

private:
std::unique_ptr<std::ifstream> file_;
std::istream& in_;
std::string line_;
};

}
16 changes: 14 additions & 2 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include <stdint.h>
#include <getopt.h>

#include <iostream>
#include <string>
#include <vector>

Expand Down Expand Up @@ -31,6 +32,7 @@ static struct option options[] = {
{"mismatch", required_argument, 0, 'x'},
{"gap", required_argument, 0, 'g'},
{"threads", required_argument, 0, 't'},
{"bed", required_argument, 0, 'B'},
{"version", no_argument, 0, 'v'},
{"help", no_argument, 0, 'h'},
#ifdef CUDA_ENABLED
Expand Down Expand Up @@ -66,7 +68,9 @@ int main(int argc, char** argv) {
uint32_t cudaaligner_band_width = 0;
bool cuda_banded_alignment = false;

std::string optstring = "ufw:q:e:m:x:g:t:h";
std::string bed_file;

std::string optstring = "ufw:q:e:m:x:g:t:B:h";
#ifdef CUDA_ENABLED
optstring += "bc::";
#endif
Expand Down Expand Up @@ -104,6 +108,9 @@ int main(int argc, char** argv) {
case 't':
num_threads = atoi(optarg);
break;
case 'B':
bed_file = std::string(optarg);
break;
case 'v':
printf("%s\n", version);
exit(0);
Expand Down Expand Up @@ -149,8 +156,10 @@ int main(int argc, char** argv) {
exit(1);
}

std::cerr << "BED file: '" << bed_file << "'\n";

auto polisher = racon::createPolisher(input_paths[0], input_paths[1],
input_paths[2], type == 0 ? racon::PolisherType::kC :
input_paths[2], bed_file, type == 0 ? racon::PolisherType::kC :
racon::PolisherType::kF, window_length, quality_threshold,
error_threshold, trim, match, mismatch, gap, num_threads,
cudapoa_batches, cuda_banded_alignment, cudaaligner_batches,
Expand Down Expand Up @@ -209,6 +218,9 @@ void help() {
" -g, --gap <int>\n"
" default: -4\n"
" gap penalty (must be negative)\n"
" -B, --bed <str>\n"
" default: ''\n"
" path to a BED file with regions to polish\n"
" -t, --threads <int>\n"
" default: 1\n"
" number of threads\n"
Expand Down
4 changes: 3 additions & 1 deletion src/meson.build
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
racon_cpp_sources = files([
'bed.cpp',
'logger.cpp',
'overlap.cpp',
'polisher.cpp',
'sequence.cpp',
'window.cpp'
'util.cpp',
'window.cpp',
])

racon_extra_flags = []
Expand Down
Loading