-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start adding type hints #1008
Start adding type hints #1008
Conversation
Having type hints where appropriate would be good. Is there a particular reason why you have used separate stub files rather than inline type annotations (which I believe can be used in Cython)? Having these hints in separate files would increase the maintenance burden, as they would easily get out of sync with the definitions in the main .pyx files. So IMHO for something being merged into the main pysam source project, it would be preferable to use type annotations in the main source files instead. That would of course require dropping Python 2.7 support. That is IMHO long overdue, but it is a decision for the primary pysam maintainers (cc @AndreasHeger @kevinjacobs-progenity). |
I'm fine with dropping Python 2 support for future releases. There are many advantages to doing so and many other commonly used bioinformatics libraries have done this. All of my codebases have migrated to Python 3.7+ and use optional typing extensively, so I have plenty of test cases. |
Using inline type annotations would be awesome, but if I’m not mistaken, it won’t actually work. Mypy can only parse annotations from a Also, if you define Another problem is that I wouldn’t know how to annotate some items, for example properties. Separate Perhaps keeping the |
0cd21dc
to
43d0fdf
Compare
|
||
BUFFER_SIZE: int | ||
|
||
class BGZFile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this class implement any of the IO interfaces?
https://github.com/python/typeshed/blob/master/stdlib/io.pyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotations for the gzip
module are here: https://github.com/python/typeshed/blob/master/stdlib/gzip.pyi
We could copy this in the future, but I’m currently focusing on annotating the other parts of pysam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is a big job. I've left some comments.
pysam/libcbcf.pyi
Outdated
|
||
_D = TypeVar("_D") | ||
|
||
class VariantHeaderRecord: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also be Mapping[str, Optional[str]]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't immediately clear why value should be Optional, but it has been a long time since I've dug into this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should definitely be simplified. I tried to factor out a couple of base classes, but then noticed subtle differences where the types didn’t quite match. For example, your keys()
method returns a Python-2-style list, but Mapping.keys()
returns an iterator. Perhaps it works anyway, but I decided to spell everything out explicitly for now. That was necessary as a first step anyway.
The value can be None, see VariantHeaderRecord.itervalues()
:
Lines 1335 to 1343 in e340c85
def itervalues(self): | |
"""D.itervalues() -> an iterator over the values of D""" | |
cdef bcf_hrec_t *r = self.ptr | |
if not r: | |
return | |
cdef int i | |
for i in range(r.nkeys): | |
if r.keys[i]: | |
yield charptr_to_str(r.vals[i]) if r.vals[i] else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all written for Python 2, so it doesn't surprise me that it was never updated to the new ABCs. Sounds like work that should be done in a follow-on PR. Same for figuring out why keys can be None in VariantHeaderRecord
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed an updated version that uses a custom _Mapping
class for all the common methods.
When I try to subclass one of the ABCs like this:
class VariantHeaderRecord(Sized):
I get the error message that I haven’t implemented all abstract methods (__len__
in this case), so it doesn’t actually help me reduce the number of lines.
def pop(self, key: str, default: str = ...) -> str: ... | ||
# def remove(self) -> Any: ... # crashes | ||
|
||
class VariantHeaderRecords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequence[VariantHeaderRecord]? May need to add ABC base class to get mixins (contains, reversed, index, count).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, also to remaining comments. I’ll need to have a look at collections.abc to see what can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t want to touch the implementation, so I haven’t actually changed this to derive from Sequence
(the mixins are missing as you noted). That can be done at a later stage.
pysam/libcbcf.pyi
Outdated
def header_record(self) -> VariantHeaderRecord: ... | ||
def remove_header(self) -> None: ... | ||
|
||
class VariantHeaderContigs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above-- Mapping[str, VariantContig]? Should the value be Optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember why you thought the value may need to be Optional? (__getitem__
always returns a VariantContig and never None.)
def add(self, id: str, length: Optional[int] = ..., **kwargs) -> None: ... | ||
|
||
class VariantHeaderSamples: | ||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above -- Sequence[str]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I left this as is because VariantHeaderSamples doesn’t implement all the methods that a Sequence requires.
I’ve finished what I think I can do before going on vacation soon. This PR now covers about 80% of the API. I initially wanted to gradually add type hints, as the Mypy authors recommend, but that doesn’t work so well here because all the Cython source code is invisible to Mypy. So whenever you use anything for which no Note that the first commit in this PR is the same as PR #1009 (adding a GitHub Actions workflow). My suggestion would be to merge #1009 or some version of it separately because it’s not strictly what this PR is about. Testing the type hints is currently done by running |
See also the test results |
This is only for type checking: Mypy (and PyCharm) complain that libchtslib is undefined (within the definition of __all__). This is actually not true because "from pysam.x import y", when used within pysam/__init__.py, will actually add not only y, but also x into the list of available symbols.
I’ve now annotated the remaining files. I added an empty The only remaining problem that I get now when running
And this is potentially the first "bug" that Mypy has found: There’s indeed a |
HTSFile and HFile need a trick described in PEP 673 https://www.python.org/dev/peps/pep-0673/
I took the liberty of publishing a pysam-stubs package to PyPI for those who want to easily test the type annotations. |
Thanks! I will work on removing python2 next. |
This is an attempt to add a couple of type hints to pysam. I’ve started with
libcalignedsegment.pyi
. I’m not finished, but need to stop for the moment.I think these annotations would already be useful in the current state, but for this to be merged, I think we would at least need to adjust the test scripts so that they run
mypy
(it fails at the moment).