Skip to content

Commit

Permalink
Merge branch 'smalton/DOR-987-polya-filter' into 'master'
Browse files Browse the repository at this point in the history
[DOR-987] Bad polyA reads

Closes DOR-987

See merge request machine-learning/dorado!1295
  • Loading branch information
malton-ont committed Dec 6, 2024
2 parents cbcdf38 + f56fec0 commit df57d34
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions dorado/poly_tail/poly_tail_calculator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ std::pair<float, float> PolyTailCalculator::estimate_samples_per_base(
}

float avg = average_samples_per_base(sizes);

// filter out reads that are outside a reasonable range
// these will likely be very bad reads, and allowing too
// large a value makes these take an excessively long time
if (avg > 1000 || avg < 1) {
return {0.f, 0.f};
}

auto quantiles = dorado::utils::quantiles(sizes, {0.1f, 0.9f});
float sum_diff_2 = 0.f;
int count = 0;
Expand Down Expand Up @@ -247,6 +255,9 @@ int PolyTailCalculator::calculate_num_bases(const SimplexRead& read,
signal_info.is_fwd_strand ? '+' : '-', signal_info.signal_anchor);

auto [num_samples_per_base, stddev] = estimate_samples_per_base(read);
if (num_samples_per_base == 0) {
return 0;
}

// Walk through signal. Require a minimum of length 10 poly-A since below that
// the current algorithm returns a lot of false intervals.
Expand Down

0 comments on commit df57d34

Please sign in to comment.