Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make default_parser() part of the public API #235

Closed
lemon24 opened this issue Apr 27, 2021 · 1 comment
Closed

Make default_parser() part of the public API #235

lemon24 opened this issue Apr 27, 2021 · 1 comment

Comments

@lemon24
Copy link
Owner

lemon24 commented Apr 27, 2021

Make default_parser() part of the public API, because it's useful stand-alone, especially if we get a magic+ parser (#222), XML sanitization (#212), or enhanced HTML sanitization (#227).

Initially, we can expose just the callable part of the parser by wrapping the parser object in a function with the same signature:

def (
    url: str,
    http_etag: Optional[str] = None,
    http_last_modified: Optional[str] = None,
) -> Optional[ParsedFeed]: ...

Because this is a new feature, feed_root should default to None (no filesystem access).

It may be nice to wrap the cache validation headers in a typed dict (although I don't like "cache_validators" as a name):

CacheValidationHeaders = TypedDict({'ETag': str, 'Last-Modified': str}, total=False)

def (url: str, cache_validators: CacheValidationHeaders) -> Optional[ParsedFeed]: ...

A question related to this: Should we allow custom parsers to store custom caching metadata? For instance, #222 might need to store one header per page. Type annotations are not stable, we can turn the TypedDict into a regular dict later.

Based on the signature above, ParsedFeed and all its components must become public / stable as well (i.e. FeedData, EntryData, their hash property). Also, ParsedFeed should probably not be a named tuple anymore.

@lemon24
Copy link
Owner Author

lemon24 commented Jan 16, 2023

The parser internal API is now documented (although still unstable): https://reader.readthedocs.io/en/latest/internal.html#module-reader._parser

Closing.

@lemon24 lemon24 closed this as completed Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant