Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(taps): Implement reference paginators #732

Merged
merged 16 commits into from
Sep 1, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.BaseAPIPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BaseAPIPaginator
======================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BaseAPIPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.BaseHATEOASPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BaseHATEOASPaginator
==========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BaseHATEOASPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.BaseOffsetPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BaseOffsetPaginator
=========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BaseOffsetPaginator
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BasePageNumberPaginator
=============================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BasePageNumberPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.HeaderLinkPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.HeaderLinkPaginator
=========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: HeaderLinkPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.JSONPathPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.JSONPathPaginator
=======================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: JSONPathPaginator
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.LegacyPaginatedStreamProtocol
===================================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: LegacyPaginatedStreamProtocol
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.LegacyStreamPaginator
===========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: LegacyStreamPaginator
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.SimpleHeaderPaginator
===========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: SimpleHeaderPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.SinglePagePaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.SinglePagePaginator
=========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: SinglePagePaginator
:members:
74 changes: 73 additions & 1 deletion docs/porting.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,82 @@ _Important: If you've gotten this far, this is a good time to commit your code b

Pagination is generally unique for almost every API. There's no single method that solves for very different API's approach to pagination.

Most likely you will use `get_next_page_token` to parse and return whatever the "next page" token is for your source, and you'll use `get_url_params` to define how to pass the "next page" token back to the API when asking for subsequent pages.
Most likely you will use [get_new_paginator](singer_sdk.RESTStream.get_new_paginator) to instantiate a [pagination class](./classes/singer_sdk.pagination.BaseAPIPaginator) for your source, and you'll use `get_url_params` to define how to pass the "next page" token back to the API when asking for subsequent pages.

When you think you have it right, run `poetry run tap-mysource` again, and debug until you are confident the result is including multiple pages back from the API.

You can also add unit tests for your pagination implementation for additional confidence:

```python
from singer_sdk.pagination import BaseHATEOASPaginator, first


class CustomHATEOASPaginator(BaseHATEOASPaginator):
"""Paginator for HATEOAS APIs - or "Hypermedia as the Engine of Application State".

This paginator expects responses to have a key "next" with a value
like "https://api.com/link/to/next-item".
""""

def get_next_url(self, response: Response) -> str | None:
"""Get a parsed HATEOAS link for the next, if the response has one."""

try:
return first(
extract_jsonpath("$.links[?(@.rel=='next')].href", response.json())
)
except StopIteration:
return None


def test_paginator_custom_hateoas():
"""Validate paginator that my custom paginator."""

resource_path = "/path/to/resource"
response = Response()
paginator = CustomHATEOASPaginator()
assert not paginator.finished
assert paginator.current_value is None
assert paginator.count == 0

response._content = json.dumps(
{
"links": [
{
"rel": "next",
"href": f"{resource_path}?page=2&limit=100",
}
]
}
).encode()
paginator.advance(response)
assert not paginator.finished
assert paginator.current_value.path == resource_path
assert paginator.current_value.query == "page=2&limit=100"
assert paginator.count == 1

response._content = json.dumps(
{
"links": [
{
"rel": "next",
"href": f"{resource_path}?page=3&limit=100",
}
]
}
).encode()
paginator.advance(response)
assert not paginator.finished
assert paginator.current_value.path == resource_path
assert paginator.current_value.query == "page=3&limit=100"
assert paginator.count == 2

response._content = json.dumps({"links": []}).encode()
paginator.advance(response)
assert paginator.finished
assert paginator.count == 3
```

Note: Depending on how well the API is designed, this could take 5 minutes or multiple hours. If you need help, sometimes [PostMan](https://postman.com) or [Thunder Client](https://marketplace.visualstudio.com/items?itemName=rangav.vscode-thunder-client) can be helpful in debugging the APIs specific quirks.

## Run pytest
Expand Down
19 changes: 19 additions & 0 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,22 @@ JSON Schema builder classes
:template: module.rst

typing


Pagination
----------

.. autosummary::
:toctree: classes
:template: class.rst

pagination.BaseAPIPaginator
pagination.SinglePagePaginator
pagination.BaseHATEOASPaginator
pagination.HeaderLinkPaginator
pagination.JSONPathPaginator
pagination.SimpleHeaderPaginator
pagination.BasePageNumberPaginator
pagination.BaseOffsetPaginator
pagination.LegacyPaginatedStreamProtocol
pagination.LegacyStreamPaginator
36 changes: 18 additions & 18 deletions samples/sample_tap_gitlab/gitlab_rest_streams.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
"""Sample tap stream test for tap-gitlab."""

from pathlib import Path
from typing import Any, Dict, List, Optional, cast
from __future__ import annotations

import requests
from pathlib import Path
from typing import Any, cast

from singer_sdk.authenticators import SimpleAuthenticator
from singer_sdk.pagination import SimpleHeaderPaginator
from singer_sdk.streams.rest import RESTStream
from singer_sdk.typing import (
ArrayType,
Expand All @@ -21,7 +22,7 @@
DEFAULT_URL_BASE = "https://gitlab.com/api/v4"


class GitlabStream(RESTStream):
class GitlabStream(RESTStream[str]):
"""Sample tap test for gitlab."""

_LOG_REQUEST_METRIC_URLS = True
Expand All @@ -39,8 +40,8 @@ def authenticator(self) -> SimpleAuthenticator:
)

def get_url_params(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Dict[str, Any]:
self, context: dict | None, next_page_token: str | None
) -> dict[str, Any]:
"""Return a dictionary of values to be used in URL parameterization."""
params: dict = {}
if next_page_token:
Expand All @@ -50,21 +51,20 @@ def get_url_params(
params["order_by"] = self.replication_key
return params

def get_next_page_token(
self, response: requests.Response, previous_token: Optional[Any]
) -> Optional[Any]:
"""Return token for identifying next page or None if not applicable."""
next_page_token = response.headers.get("X-Next-Page", None)
if next_page_token:
self.logger.debug(f"Next page token retrieved: {next_page_token}")
return next_page_token
def get_new_paginator(self) -> SimpleHeaderPaginator:
"""Return a new paginator for GitLab API endpoints.

Returns:
A new paginator.
"""
return SimpleHeaderPaginator("X-Next-Page")


class ProjectBasedStream(GitlabStream):
"""Base class for streams that are keys based on project ID."""

@property
def partitions(self) -> List[dict]:
def partitions(self) -> list[dict]:
"""Return a list of partition key dicts (if applicable), otherwise None."""
if "{project_id}" in self.path:
return [
Expand Down Expand Up @@ -162,7 +162,7 @@ class EpicsStream(ProjectBasedStream):

# schema_filepath = SCHEMAS_DIR / "epics.json"

def get_child_context(self, record: dict, context: Optional[dict]) -> dict:
def get_child_context(self, record: dict, context: dict | None) -> dict:
"""Perform post processing, including queuing up any child stream types."""
# Ensure child state record(s) are created
return {
Expand All @@ -183,8 +183,8 @@ class EpicIssuesStream(GitlabStream):
parent_stream_type = EpicsStream # Stream should wait for parents to complete.

def get_url_params(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Dict[str, Any]:
self, context: dict | None, next_page_token: str | None
) -> dict[str, Any]:
"""Return a dictionary of values to be used in parameterization."""
result = super().get_url_params(context, next_page_token)
if not context or "epic_id" not in context:
Expand Down
Loading