-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add documention for VariantContext.getStart() regarding telomeric events #1369
add documention for VariantContext.getStart() regarding telomeric events #1369
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1369 +/- ##
==============================================
- Coverage 67.85% 67.841% -0.009%
+ Complexity 8283 8282 -1
==============================================
Files 564 564
Lines 33695 33695
Branches 5650 5650
==============================================
- Hits 22862 22859 -3
- Misses 8653 8655 +2
- Partials 2180 2181 +1
|
@@ -1664,7 +1664,15 @@ public String getContig() { | |||
|
|||
/** | |||
* @return 1-based inclusive start position of the Variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are here, I would reformat the javadoc like this:
/**
* Summary sentence.
*<p>
* explanation of what it does without loosing time in details of particular edge cases.
*</p>
* <p>
* edge case-1
*</p>
* <p>
* edge case-2
* </p>
*
* @return the text above should had made clear wha is returned... here you report on the possible range of values very briefly.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example in this case something like this:
/**
* Returns the (start) position of the variant.
* <p>
* Main blah blah ... 1-based ... blah blah.
*</p>
*<p>
* For telomeres blah blah can be 0 or N+1 blah blah
*</p>
* @return 0 or greater, never a negative number.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm just putting emphasis in the summary sentence (first one finished with .) and the statement in the [at]return tag, the details in the section in between is up to you (that is way I add those "blah" "blah").
That said I think you should be more concise... The programmer don't want to expend to much time to understand what may happen in 0.1% percent of cases. For example I would not include details on what is said or depicted in the spec, just simply refer to it for further reading/details.
@vruano Thanks for taking a look! |
In my view is just too much text; think about the 99.5% of people that don't care about telomeres. but I rather you get some other opinion as perhaps I'm just too pedantic. About the summary and [at]return annotation. I mean something like this:
The summary sentence is "what it does" very briefly whareas the [at]return is just give info about the possible range of values without semantics (this goes into the summary and the rest of the javadoc. For example if the return was Object the often is either |
I would add |
I agree with you that the summary was too long. |
@@ -1663,8 +1663,11 @@ public String getContig() { | |||
} | |||
|
|||
/** | |||
* Returns 1-based inclusive start position of the variant, 0 or greater. | |||
* See below for explanation on "0". | |||
* Returns 1-based inclusive start position of the variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the "start" since this is in fact controversial, despite that it is the method's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would people know what "1-based inclusive" mean? perhaps that should be remove from here and or explained in another "
" block... but since this not the only place we use 1-based indexes ... so do you even need to explain it here..... I guess it won't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the "start" since this is in fact controversial, despite that it is the method's name.
Why is "start" controversial?
I don't quite understand the rest of the comments
* Returns 1-based inclusive start position of the variant. | ||
* | ||
* <p> | ||
* INDEL events usually start on the first unaltered reference base before the INDEL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confussing... INDEL event actual start in the bae before the INDEL... not really ... the indel start where it starts... perhaps you mean to say that it is reported one base before.
Also perhaps you should try to be more general here... instead of talking about INDEL alone you could say something like:
"Notice that for some types of variant events the actual start position may not be this value (e.g. deletions are reported on the base before the first base deleted)."
So you are not giving the impression that it may only happen with deletions or indels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is directly copied from the old comment.
But I agree that it can be more general.
* Note also that the VCF spec allows 0 and N + 1 for POS field for telomeric event, | ||
* where N is the length of the chromosome. | ||
* The "0" value returned should be interpreted as telomere, and does not violate the above "1-based" comment. | ||
* It's the responsibility of code generating such variants to make sure {@code start} is populated correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is unfortunate the the spec talks about "Telomeres" only. It is assuming that the CHROM (thus the referene) only can contain full non-circular chromosomes.... total BS. (e.g. the MT chromosome in humans or nearly every contig in unfinished genomic references)
Yes CHROM as a name wasn't a good choice to start with but we don't need to keep up to that mistake. Notice that API using "contig" instead of "chr" or "chromosome; it is still assuming too much but is closer to the truth..
So I would refrain to make it seems as it can only be telomerees.
What about something like:
"
This property can take on "0" and "N+1" (where N is the last base in the enclosing contig) when this variant record makes references to events that happen before or at the beginning or after or at the end of the enclosing contig."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this PR does not turn into a loath about the spec itself. I can only work within the current spec for this PR.
So, since the latest spec talks about telomere, I'm happy to use a single word "telomere" to avoid such long sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't, I leave it up to you at this point.
Sorry perhaps I went to far giving examples ... can borrow mine but you can/should use your own words. |
* | ||
* <p> | ||
* INDEL events usually start on the first unaltered reference base before the INDEL. | ||
* </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</p>
markup is unnecessary as the <p>
tag closes the previous open paragraph tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comments are valid but I don't think they have to be addressed as part of this change, this is just mentioning the existence of telomere and N+1 positions as possibilities.
Description
The VCF spec allows
POS
column to take value 0 (or chromosome length + 1), when the event is at a telomere.Currently the documentation for
public int VariantContext.getStart()
claims it returns a 1-based value, where it seems that 0 is an invalid value.This PR intends to clarify that, by adding documentation.
Checklist
(Documentation change, most of the following does not apply)