-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Add duplicated sequences to output and fix vcf output #208
Conversation
Codecov Report
@@ Coverage Diff @@
## master #208 +/- ##
==========================================
- Coverage 98.35% 98.34% -0.01%
==========================================
Files 18 18
Lines 850 848 -2
==========================================
- Hits 836 834 -2
Misses 14 14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To discuss: What is the length of a tandem duplication.
I had a quick look at the specification (source) and I found that
SVLEN = Difference in length between REF and ALT alleles
Therefore, I would change the length to 27 in your example.
However, it feels a bit redundant to me that SVLEN is always END-POS, because it doesn't add information.
// The SVLEN is neither too short nor too long than specified by the user. | ||
if (std::abs(sv_length) >= args.min_var_length && | ||
std::abs(sv_length) <= args.max_var_length) | ||
std::abs(sv_length) <= args.max_var_length && | ||
((sv_type != "DUP" || sv_type != "INS") || insert_size <= args.max_tol_deleted_length) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this condition, and (sv_type != "DUP" || sv_type != "INS")
is always true.
Also note that our definition of SVLEN has impact here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah damn my summary has ruined it 😅 thx for the hint
if (mate1.orientation == strand::forward) | ||
{ | ||
size_t insert_size = cluster.get_average_inserted_sequence_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t insert_size = cluster.get_average_inserted_sequence_size(); | |
size_t const insert_size = cluster.get_average_inserted_sequence_size(); |
The three if-statements above can be merged into a single one.
Yes, but the SVLEN is not used everywhere, I found a discussion about INVs here. Maybe we just introduce a second INFO value beside the SVLEN. I will adjust the PR. |
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
06570f3
to
311a96c
Compare
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
311a96c
to
91ec8dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
if (mate1.orientation == mate2.orientation) | ||
{ | ||
if (mate1.seq_name == mate2.seq_name) | ||
{ | ||
size_t insert_size = cluster.get_average_inserted_sequence_size(); | ||
if (mate1.orientation == strand::forward) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be merged into one condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the two if cases?
I opened this PR because I actually want to call INVs. For calling INVs there will be an else
for if (mate1.orientation == mate2.orientation)
. So I would leave this as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three nested if cases actually...
Of course, if you add else branches later, then it won't be possible.
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because they end in the same place.
This PR adds the duplicated sequence to the output.
To discuss: What is the length of a tandem duplication. In this PR I decided to use the length of the duplicated sequence.
For example:
Than the duplication goes from (5, 23],
its duplicated sequence is AAACCCGGG
and the SVLEN is 9.