Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Add duplicated sequences to output and fix vcf output #208

Merged
merged 2 commits into from
Jun 7, 2022

Conversation

Irallia
Copy link
Collaborator

@Irallia Irallia commented May 24, 2022

This PR adds the duplicated sequence to the output.
To discuss: What is the length of a tandem duplication. In this PR I decided to use the length of the duplicated sequence.
For example:

ref  NNNNN AAACCCGGG AAACCCGGG NNNNNN
read NNNNN AAACCCGGG AAACCCGGG AAACCCGGG AAACCCGGG AAACCCGGG NNNNNN

Than the duplication goes from (5, 23],
its duplicated sequence is AAACCCGGG
and the SVLEN is 9.

@Irallia Irallia requested review from joergi-w and joshuak94 May 24, 2022 12:03
@Irallia Irallia self-assigned this May 24, 2022
@codecov
Copy link

codecov bot commented May 24, 2022

Codecov Report

Merging #208 (91ec8dc) into master (71ad200) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
- Coverage   98.35%   98.34%   -0.01%     
==========================================
  Files          18       18              
  Lines         850      848       -2     
==========================================
- Hits          836      834       -2     
  Misses         14       14              
Impacted Files Coverage Δ
src/iGenVar.cpp 100.00% <ø> (ø)
...sv_detection_methods/analyze_split_read_method.cpp 96.84% <100.00%> (+0.03%) ⬆️
src/variant_detection/variant_output.cpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71ad200...91ec8dc. Read the comment docs.

Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To discuss: What is the length of a tandem duplication.

I had a quick look at the specification (source) and I found that
SVLEN = Difference in length between REF and ALT alleles
Therefore, I would change the length to 27 in your example.

However, it feels a bit redundant to me that SVLEN is always END-POS, because it doesn't add information.

// The SVLEN is neither too short nor too long than specified by the user.
if (std::abs(sv_length) >= args.min_var_length &&
std::abs(sv_length) <= args.max_var_length)
std::abs(sv_length) <= args.max_var_length &&
((sv_type != "DUP" || sv_type != "INS") || insert_size <= args.max_tol_deleted_length) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this condition, and (sv_type != "DUP" || sv_type != "INS") is always true.

Also note that our definition of SVLEN has impact here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn my summary has ruined it 😅 thx for the hint

if (mate1.orientation == strand::forward)
{
size_t insert_size = cluster.get_average_inserted_sequence_size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t insert_size = cluster.get_average_inserted_sequence_size();
size_t const insert_size = cluster.get_average_inserted_sequence_size();

The three if-statements above can be merged into a single one.

@Irallia
Copy link
Collaborator Author

Irallia commented May 24, 2022

To discuss: What is the length of a tandem duplication.

I had a quick look at the specification (source) and I found that SVLEN = Difference in length between REF and ALT alleles Therefore, I would change the length to 27 in your example.

However, it feels a bit redundant to me that SVLEN is always END-POS, because it doesn't add information.

Yes, but the SVLEN is not used everywhere, I found a discussion about INVs here. Maybe we just introduce a second INFO value beside the SVLEN. I will adjust the PR.

Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
@Irallia Irallia force-pushed the FIX/DUP/fix_tandem_dup_output branch from 06570f3 to 311a96c Compare May 25, 2022 16:22
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
@Irallia Irallia force-pushed the FIX/DUP/fix_tandem_dup_output branch from 311a96c to 91ec8dc Compare May 25, 2022 16:49
@Irallia Irallia requested a review from joergi-w May 25, 2022 16:56
Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

Comment on lines 73 to 78
if (mate1.orientation == mate2.orientation)
{
if (mate1.seq_name == mate2.seq_name)
{
size_t insert_size = cluster.get_average_inserted_sequence_size();
if (mate1.orientation == strand::forward)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be merged into one condition

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the two if cases?
I opened this PR because I actually want to call INVs. For calling INVs there will be an else for if (mate1.orientation == mate2.orientation). So I would leave this as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three nested if cases actually...
Of course, if you add else branches later, then it won't be possible.

Comment on lines 122 to 124
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they end in the same place.

@Irallia Irallia merged commit 1c420cf into seqan:master Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants