Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve annotation of fromstring #64

Merged
merged 5 commits into from
Jul 26, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions lxml-stubs/etree.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -496,8 +496,11 @@ def parse(
source: _FileSource, parser: XMLParser = ..., base_url: _AnyStr = ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTMLParser is also allowed here.

) -> _ElementTree: ...
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
def fromstring(
text: _AnyStr, parser: XMLParser = ..., *, base_url: _AnyStr = ...
) -> _Element: ...
text: _AnyStr,
parser: Union[XMLParser, HTMLParser] = ...,
*,
base_url: _AnyStr = ...,
) -> Optional[_Element]: ...
@overload
def tostring(
element_or_tree: _ElementOrTree,
Expand Down
10 changes: 9 additions & 1 deletion test-data/test-etree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,15 @@
main: |
from lxml import etree
document = etree.fromstring("<doc></doc>")
reveal_type(document) # N: Revealed type is "lxml.etree._Element"
reveal_type(document) # N: Revealed type is "Union[lxml.etree._Element, None]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since None is really a thing that you'd only get if you pass a suitable parser (and that gets in the way otherwise), maybe we can split up the declarations of fromstring() and parse() to only allow None return values if the parser argument is provided?

Admittedly, you can also change the default parser globally, which then allows a None return value here. And, in fact, it's not even guaranteed that the return value is an Element otherwise, since parsers can really return whatever they choose. However, that's such an extremely rare use case that I'd lean towards not expressing it in the type system. (OTOH, the whole point of this PR is to cater for an extremely rare use case, so …)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In typeshed we generally avoid Unions in return types, for reasons discussed in python/mypy#1693. In this case, it sounds like returning None is a very rare edge case, but if we return Element | None, we force every caller to check for None after calling this function. As you say, one improvement could be to use overloads for cases where we know the return value is definitely not None. Another approach we've used in typeshed is to return something like Element | Any. This gives you a lot of type safety but also allows callers to do None checks without hitting "unreachable code" warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks both!

On returning Union types: I'd read some of the discussion in python/typing#566 which sounded promising at first, but as Jukka writes here:

For new code and new APIs the recommended way is to avoid signatures that would require the use of AnyOf anyway.

I'm about to push three commits which:

  • clarify that parse accepts an HTML parser too;
  • express that fromstring() without a parser returns an _Element (the comment about changing the default parser notwithstanding; and
  • change the return type of fromstring to _Element | Any.

To be explicit: I don't have strong opinions about this PR. I like the idea of annotations that describe everything that a function can possibly do; but there's a balance to between that and pragmatism, and I'm happy to drop the | None change if it's too niche. It'd be nice to get the bit about HTMLParser in though, if nothing else!


- case: etree_from_empty_string_with_parser_recovery_returns_none
disable_cache: true
main: |
from lxml import etree
parser = etree.HTMLParser(recover=True)
document = etree.fromstring("", parser)
reveal_type(document) # N: Revealed type is "Union[lxml.etree._Element, None]"

- case: etree_element_find
disable_cache: true
Expand Down