Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVLEN description wrong for INV variants #139

Closed
d-cameron opened this issue Mar 16, 2016 · 3 comments
Closed

SVLEN description wrong for INV variants #139

d-cameron opened this issue Mar 16, 2016 · 3 comments

Comments

@d-cameron
Copy link
Contributor

The SVLEN description "Difference in length between REF and ALT alleles" does not match the real-world usage of the field. In particular, SV callers (pindel, lumpy, cortex) report the SVLEN of INV variants as the length of the variant, not zero. This description should be updated to reflect the more informative real-world usage.

@thefferon
Copy link

Personally I find the current definition of SVLEN useful: it is clearly defined and captures data that is not expressed elsewhere in all variants. The length of any straightforward variant – e.g., deletions, the inversions you mention, etc. – can easily be calculated by subtracting POS from END, making SVLEN redundant. However for insertions SVLEN can be an indicator of insertion length (if too large to list sequence), and for deletion-insertions in which the replacing sequence is not of the same length as the deleted sequence, SVLEN provides a practical and useful metric.

@thefferon
Copy link

@d-cameron : I'm revisiting this, two months on. I understand your frustration with the way SVLEN pertains to inversions – a 10-Kb inversion, for example, is assigned SVLEN=0 according to the current spec.

To turn the question (or the example) around, what value would you assign to SVLEN for a 10,000-bp insertion between two nucleotides, n and n+1? If you go by the current spec for SVLEN, the insertion would get SVLEN=10000. However if you re-define SVLEN so that the inversion has SVLEN=10000 (value presumably to be determined by the difference between POS and END, or the "size" of the inversion), then are't you at the same time requiring that the 10-Kb insertion get SVLEN=0? Is that better than the current situation?

@d-cameron
Copy link
Contributor Author

Current real-world usage by SV callers report it as the length of the relevant variant. Delly even went as far as removing it from their VCF output as they considered it poorly defined (https://groups.google.com/forum/#!msg/delly-users/SiKLXw_piIc/6vh7zDvdlvAJ).

It's a bit late to change the field definition now as there are tools that do write it in a spec-compliant manner. It retrospect, something like a pair of fields such as SVLEN and DELTALEN would have captured both usages but as SVLEN is currently defined as the delta, there's not too much we could do now.

As it stands now, my processing of SVLEN has to take the abs() and switch on SVTYPE to get the code working with multiple callers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants