Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start adding type hints #1008

Merged
merged 26 commits into from
Mar 24, 2022
Merged

Conversation

marcelm
Copy link
Contributor

@marcelm marcelm commented Apr 22, 2021

This is an attempt to add a couple of type hints to pysam. I’ve started with libcalignedsegment.pyi. I’m not finished, but need to stop for the moment.

I think these annotations would already be useful in the current state, but for this to be merged, I think we would at least need to adjust the test scripts so that they run mypy (it fails at the moment).

@jmarshall
Copy link
Member

Having type hints where appropriate would be good.

Is there a particular reason why you have used separate stub files rather than inline type annotations (which I believe can be used in Cython)?

Having these hints in separate files would increase the maintenance burden, as they would easily get out of sync with the definitions in the main .pyx files. So IMHO for something being merged into the main pysam source project, it would be preferable to use type annotations in the main source files instead.

That would of course require dropping Python 2.7 support. That is IMHO long overdue, but it is a decision for the primary pysam maintainers (cc @AndreasHeger @kevinjacobs-progenity).

@kevinjacobs-progenity
Copy link
Member

I'm fine with dropping Python 2 support for future releases. There are many advantages to doing so and many other commonly used bioinformatics libraries have done this. All of my codebases have migrated to Python 3.7+ and use optional typing extensively, so I have plenty of test cases.

@marcelm
Copy link
Contributor Author

marcelm commented Apr 23, 2021

Is there a particular reason why you have used separate stub files rather than inline type annotations (which I believe can be used in Cython)?

Using inline type annotations would be awesome, but if I’m not mistaken, it won’t actually work. Mypy can only parse annotations from a .py or .pyi file (see python/mypy#8575 (comment)). See also python/mypy#8631. (But stubgen, on the other hand, does work somewhat – I ran it on pysam to get started with the annotations).

Also, if you define def f(int x): pass in a .pyx file, Cython generates code that checks at runtime whether the passed x object is an int. If you write this as def f(x: int): pass, then this code is not generated. So the runtime behavior would change. I think PEP 484 annotations are most useful (or perhaps only meant to be used) in pure Python mode.

Another problem is that I wouldn’t know how to annotate some items, for example properties.

Separate .pyi files have an upside as well: They allow you to present the API that you want people to use. I omitted all the deprecated functions and parameters in this PR.

Perhaps keeping the .pyx and .pyi files in sync can be solved with some additional tests.

@marcelm marcelm force-pushed the type-hints branch 3 times, most recently from 0cd21dc to 43d0fdf Compare June 28, 2021 07:40

BUFFER_SIZE: int

class BGZFile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. What do you suggest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotations for the gzip module are here: https://github.com/python/typeshed/blob/master/stdlib/gzip.pyi
We could copy this in the future, but I’m currently focusing on annotating the other parts of pysam.

Copy link
Member

@kevinjacobs-progenity kevinjacobs-progenity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a big job. I've left some comments.


_D = TypeVar("_D")

class VariantHeaderRecord:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be Mapping[str, Optional[str]]?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't immediately clear why value should be Optional, but it has been a long time since I've dug into this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should definitely be simplified. I tried to factor out a couple of base classes, but then noticed subtle differences where the types didn’t quite match. For example, your keys() method returns a Python-2-style list, but Mapping.keys() returns an iterator. Perhaps it works anyway, but I decided to spell everything out explicitly for now. That was necessary as a first step anyway.

The value can be None, see VariantHeaderRecord.itervalues():

pysam/pysam/libcbcf.pyx

Lines 1335 to 1343 in e340c85

def itervalues(self):
"""D.itervalues() -> an iterator over the values of D"""
cdef bcf_hrec_t *r = self.ptr
if not r:
return
cdef int i
for i in range(r.nkeys):
if r.keys[i]:
yield charptr_to_str(r.vals[i]) if r.vals[i] else None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was all written for Python 2, so it doesn't surprise me that it was never updated to the new ABCs. Sounds like work that should be done in a follow-on PR. Same for figuring out why keys can be None in VariantHeaderRecords.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed an updated version that uses a custom _Mapping class for all the common methods.

When I try to subclass one of the ABCs like this:

class VariantHeaderRecord(Sized):

I get the error message that I haven’t implemented all abstract methods (__len__ in this case), so it doesn’t actually help me reduce the number of lines.

def pop(self, key: str, default: str = ...) -> str: ...
# def remove(self) -> Any: ... # crashes

class VariantHeaderRecords:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequence[VariantHeaderRecord]? May need to add ABC base class to get mixins (contains, reversed, index, count).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, also to remaining comments. I’ll need to have a look at collections.abc to see what can be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t want to touch the implementation, so I haven’t actually changed this to derive from Sequence (the mixins are missing as you noted). That can be done at a later stage.

pysam/libcbcf.pyi Outdated Show resolved Hide resolved
def header_record(self) -> VariantHeaderRecord: ...
def remove_header(self) -> None: ...

class VariantHeaderContigs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above-- Mapping[str, VariantContig]? Should the value be Optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember why you thought the value may need to be Optional? (__getitem__ always returns a VariantContig and never None.)

def add(self, id: str, length: Optional[int] = ..., **kwargs) -> None: ...

class VariantHeaderSamples:
@property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above -- Sequence[str]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I left this as is because VariantHeaderSamples doesn’t implement all the methods that a Sequence requires.

pysam/libcbcf.pyi Outdated Show resolved Hide resolved
pysam/libcbcf.pyi Outdated Show resolved Hide resolved
pysam/libcbcf.pyi Outdated Show resolved Hide resolved
pysam/libcbcf.pyi Outdated Show resolved Hide resolved
@marcelm
Copy link
Contributor Author

marcelm commented Jun 28, 2021

I’ve finished what I think I can do before going on vacation soon.

This PR now covers about 80% of the API. I initially wanted to gradually add type hints, as the Mypy authors recommend, but that doesn’t work so well here because all the Cython source code is invisible to Mypy. So whenever you use anything for which no .pyi file exists, you get "doesn’t have attribute/module xyz" errors.

Note that the first commit in this PR is the same as PR #1009 (adding a GitHub Actions workflow). My suggestion would be to merge #1009 or some version of it separately because it’s not strictly what this PR is about.

Testing the type hints is currently done by running mypy pysam/. This only checks internal consistency, and should be extended at some point to also cover the tests/ directory.

@marcelm marcelm marked this pull request as ready for review June 28, 2021 13:48
@marcelm
Copy link
Contributor Author

marcelm commented Jun 28, 2021

See also the test results

@marcelm
Copy link
Contributor Author

marcelm commented Nov 27, 2021

I’ve now annotated the remaining files. I added an empty libcvcf.pyi as it appears that libcvcf.pyx is outdated. Or is it?

The only remaining problem that I get now when running mypy pysam/ is this:

pysam/__init__.py:26: error: Incompatible import of "TabixIterator" (imported name has type "Type[pysam.libcbcf.TabixIterator]", local name has type "Type[pysam.libctabix.TabixIterator]")

And this is potentially the first "bug" that Mypy has found: There’s indeed a TabixIterator both in libcbcf.pyx and libctabix.pyx. I’m not sure what to do, they cannot both be exposed beneath the pysam. namespace.

@marcelm
Copy link
Contributor Author

marcelm commented Nov 30, 2021

I took the liberty of publishing a pysam-stubs package to PyPI for those who want to easily test the type annotations.

@AndreasHeger
Copy link
Contributor

Thanks! I will work on removing python2 next.

@AndreasHeger AndreasHeger merged commit 2a5e55e into pysam-developers:master Mar 24, 2022
@marcelm marcelm deleted the type-hints branch March 24, 2022 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants