Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@handle_urls() with item return type #84

Merged
merged 48 commits into from
Oct 27, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
aa8698c
update handle_urls() and OverrideRule to accept data types
BurnzZ Sep 28, 2022
cc5ec17
update flake8 to ignore D102 in tests/po_lib_data_type
BurnzZ Sep 28, 2022
c135b2d
update mypy config to ignore tests.po_lib_data_type
BurnzZ Sep 28, 2022
7837b3c
rename 'data_type' into 'to_return' in handle_urls() and OverrideRule
BurnzZ Oct 3, 2022
eb7e1b2
rename 'overrides' into 'instead_of' in @handle_urls
BurnzZ Oct 3, 2022
904dd8c
rename and deprecate OverrideRule into ApplyRule
BurnzZ Oct 3, 2022
03f5656
fix tests
BurnzZ Oct 3, 2022
3ed8415
update CHANGELOG with regards to ApplyRule and @handle_urls changes
BurnzZ Oct 3, 2022
3a5499c
create from_apply_rules method in PageObjectRegistry; deprecate from_…
BurnzZ Oct 3, 2022
01112e7
rename 'web_poet.overrides' into 'web_poet.rules'
BurnzZ Oct 3, 2022
b1c7e14
rename PageObjectRegistry's methods: get_overrides → get_rules, searc…
BurnzZ Oct 3, 2022
1afddc9
import * from 'rules' in 'overrides'
BurnzZ Oct 3, 2022
144e39e
prioritize 'to_return' parameter compared to derived item_cls
BurnzZ Oct 3, 2022
fee63a5
fix the deprecated 'overrides' parameter not being used if present
BurnzZ Oct 6, 2022
33a48ac
enable auto-conversion to url_matcher.Patterns on ApplyRules.for_patt…
BurnzZ Oct 6, 2022
9b6f9c4
update all arguments of ApplyRule to be keyword-only except 'for_patt…
BurnzZ Oct 7, 2022
8264565
fix mypy issue in ApplyRule tests
BurnzZ Oct 7, 2022
3287881
improve tests
BurnzZ Oct 14, 2022
c5faf38
update docstrings/tutorials regarding the new 'to_return' parameter
BurnzZ Oct 14, 2022
9729e8f
clean-up CHANGELOG formatting
BurnzZ Oct 14, 2022
8efa813
update CHANGELOG to soften the value of the 'to_return' param
BurnzZ Oct 14, 2022
52a6f47
update override docs to change the tone about the 'to_return' parameter
BurnzZ Oct 14, 2022
2cb518b
Apply naming and grammar suggestions
BurnzZ Oct 14, 2022
88c511d
test improvements
BurnzZ Oct 14, 2022
928188e
remove 'preferred' param of get_item_cls()
BurnzZ Oct 14, 2022
3679b59
update default behavior of @handle_urls to return dict instead of None
BurnzZ Oct 14, 2022
fc0ba50
improve the docstring of handle_urls()
BurnzZ Oct 14, 2022
5967e34
update docs by removing tick mark chars in anchors
BurnzZ Oct 14, 2022
0626e57
rename some *.com URLs into *.example in docs and tests
BurnzZ Oct 17, 2022
5fdf4a1
Merge branch 'master' of ssh://github.com/scrapinghub/web-poet into h…
BurnzZ Oct 17, 2022
de15a86
revert default 'to_return=dict' and use 'None' instead
BurnzZ Oct 18, 2022
f354c4a
improve docs and code comments
BurnzZ Oct 18, 2022
bce97be
improve docstring of 'search_rules()'
BurnzZ Oct 19, 2022
4e00ea8
Improve the docs
BurnzZ Oct 25, 2022
59381a5
add reference link to Page Objects in Overrides tutorial
BurnzZ Oct 25, 2022
076e7bb
remove mention of 'to_return' in @handle_url doc examples
BurnzZ Oct 26, 2022
1419f7a
improve tests
BurnzZ Oct 26, 2022
776cf0d
improve docstrings and warning messages
BurnzZ Oct 26, 2022
42bd123
Fix test case when ensuring that ApplyRule is frozen
BurnzZ Oct 26, 2022
36cd866
update tests to check each param change on hash()
BurnzZ Oct 26, 2022
383b4f7
update 'Item Class' to 'item class'
BurnzZ Oct 26, 2022
4755ce9
add an Overview section to the Overrides docs; rename them to Apply R…
kmike Oct 26, 2022
7240c9e
bump Sphinx version
kmike Oct 26, 2022
85b9b7b
simplify ApplyRule docstring
kmike Oct 26, 2022
ddaed2f
typo fix
kmike Oct 26, 2022
660a8cd
Apply suggestions from code review
kmike Oct 27, 2022
3e23c51
mention str -> Patterns conversion
kmike Oct 27, 2022
2c570e2
Merge pull request #90 from scrapinghub/handle_urls-docs
kmike Oct 27, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,7 @@ per-file-ignores =
# imports are there to expose submodule functions so they can be imported
# directly from that module
# F403: Ignore * imports in these files
# D102: Missing docstring in public method
web_poet/__init__.py:F401,F403
web_poet/page_inputs/__init__.py:F401,F403
tests/po_lib_to_return/__init__.py:D102
31 changes: 31 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,37 @@
Changelog
=========

TBD
---

* New ``ApplyRule`` class created by the ``@handle_urls`` decorator.
This is nearly identical with ``OverrideRule`` except that it's accepting
a ``to_return`` parameter which signifies the data container class that
the Page Object returns.

* Modify the call signature of ``handle_urls``:

* New ``to_return`` parameter which signifies the data container class that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should downplay it a bit, because most users shouldn't use to_return argument directly. What do you think about documenting it this way?

  1. handle_urls now sets rule.to_return to the item type declared in the ItemPage
  2. for advanced use cases, it's possible to change it by using to_return argument of handle_urls decorator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated it in 8efa813.

the Page Object returns. This behaves exactly the same with ``overrides``
but it's more consistent with the attribute names of ``ApplyRules`` that
it creates.
* New ``instead_of`` parameter which does the same thing with ``overrides``.
BurnzZ marked this conversation as resolved.
Show resolved Hide resolved
* The old ``overrides`` parameter is not required anymore as it's set for
deprecation.

Deprecations:

* The ``overrides`` parameter from ``@handle_urls`` is now deprecated.
Use the ``instead_of`` parameter instead.
* ``OverrideRule`` is now deprecated. Use ``ApplyRule`` instead.
* The ``from_override_rules`` method of ``PageObjectRegistry`` is now deprecated.
Use ``from_apply_rules`` instead.
* The ``web_poet.overrides`` is deprecated. Use ``web_poet.rules`` instead.
* The ``PageObjectRegistry.get_overrides`` is deprecated.
Use ``PageObjectRegistry.get_rules`` instead.
* The ``PageObjectRegistry.search_overrides`` is deprecated.
Use ``PageObjectRegistry.search_rules`` instead.
BurnzZ marked this conversation as resolved.
Show resolved Hide resolved

0.5.1 (2022-09-23)
------------------

Expand Down
2 changes: 1 addition & 1 deletion docs/advanced/fields.rst
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ inherit from the "base", "standard" Page Object, there could be a ``@field``
from the base class which is not present in the ``CustomItem``.
It'd be still passed to ``CustomItem.__init__``, causing an exception.

One way to solve it is to make the orignal Page Object a dependency
One way to solve it is to make the original Page Object a dependency
instead of inheriting from it, as explained in the beginning.

Alternatively, you can use ``skip_nonitem_fields=True`` class argument - it tells
Expand Down
2 changes: 1 addition & 1 deletion docs/api-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ use cases and some examples.

.. autofunction:: web_poet.handle_urls

.. automodule:: web_poet.overrides
.. automodule:: web_poet.rules
:members:
:exclude-members: handle_urls

Expand Down
168 changes: 84 additions & 84 deletions docs/intro/overrides.rst

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,7 @@ multi_line_output = 3
show_error_codes = true
ignore_missing_imports = true
no_warn_no_return = true

[[tool.mypy.overrides]]
module = "tests.po_lib_to_return.*"
ignore_errors = true
kmike marked this conversation as resolved.
Show resolved Hide resolved
15 changes: 9 additions & 6 deletions tests/po_lib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@


class POBase(ItemPage):
expected_overrides: Type[ItemPage]
expected_instead_of: Type[ItemPage]
expected_patterns: Patterns
expected_to_return = None
expected_meta: Dict[str, Any]


Expand All @@ -26,18 +27,20 @@ class POTopLevelOverriden2(ItemPage):


# This first annotation is ignored. A single annotation per registry is allowed
@handle_urls("example.com", overrides=POTopLevelOverriden1)
@handle_urls("example.com", instead_of=POTopLevelOverriden1)
@handle_urls(
"example.com", overrides=POTopLevelOverriden1, exclude="/*.jpg|", priority=300
"example.com", instead_of=POTopLevelOverriden1, exclude="/*.jpg|", priority=300
)
class POTopLevel1(POBase):
expected_overrides = POTopLevelOverriden1
expected_instead_of = POTopLevelOverriden1
expected_patterns = Patterns(["example.com"], ["/*.jpg|"], priority=300)
expected_to_return = None
expected_meta = {} # type: ignore


@handle_urls("example.com", overrides=POTopLevelOverriden2)
@handle_urls("example.com", instead_of=POTopLevelOverriden2)
class POTopLevel2(POBase):
expected_overrides = POTopLevelOverriden2
expected_instead_of = POTopLevelOverriden2
expected_patterns = Patterns(["example.com"])
expected_to_return = None
expected_meta = {} # type: ignore
5 changes: 3 additions & 2 deletions tests/po_lib/a_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ class POModuleOverriden(ItemPage):
...


@handle_urls("example.com", overrides=POModuleOverriden, extra_arg="foo")
@handle_urls("example.com", instead_of=POModuleOverriden, extra_arg="foo")
class POModule(POBase):
expected_overrides = POModuleOverriden
expected_instead_of = POModuleOverriden
expected_patterns = Patterns(["example.com"])
expected_to_return = None
expected_meta = {"extra_arg": "foo"} # type: ignore
5 changes: 3 additions & 2 deletions tests/po_lib/nested_package/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ class PONestedPkgOverriden(ItemPage):
@handle_urls(
include=["example.com", "example.org"],
exclude=["/*.jpg|"],
overrides=PONestedPkgOverriden,
instead_of=PONestedPkgOverriden,
)
class PONestedPkg(POBase):
expected_overrides = PONestedPkgOverriden
expected_instead_of = PONestedPkgOverriden
expected_patterns = Patterns(["example.com", "example.org"], ["/*.jpg|"])
expected_to_return = None
expected_meta = {} # type: ignore
5 changes: 3 additions & 2 deletions tests/po_lib/nested_package/a_nested_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@ class PONestedModuleOverriden(ItemPage):
@handle_urls(
include=["example.com", "example.org"],
exclude=["/*.jpg|"],
overrides=PONestedModuleOverriden,
instead_of=PONestedModuleOverriden,
)
class PONestedModule(POBase):
expected_overrides = PONestedModuleOverriden
expected_instead_of = PONestedModuleOverriden
expected_patterns = Patterns(
include=["example.com", "example.org"], exclude=["/*.jpg|"]
)
expected_to_return = None
expected_meta = {} # type: ignore
7 changes: 4 additions & 3 deletions tests/po_lib_sub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@


class POBase(ItemPage):
expected_overrides: Type[ItemPage]
expected_instead_of: Type[ItemPage]
expected_patterns: Patterns
expected_meta: Dict[str, Any]

Expand All @@ -18,8 +18,9 @@ class POLibSubOverriden(ItemPage):
...


@handle_urls("sub_example.com", overrides=POLibSubOverriden)
@handle_urls("sub_example.com", instead_of=POLibSubOverriden)
class POLibSub(POBase):
expected_overrides = POLibSubOverriden
expected_instead_of = POLibSubOverriden
expected_patterns = Patterns(["sub_example.com"])
expected_to_return = None
expected_meta = {} # type: ignore
160 changes: 160 additions & 0 deletions tests/po_lib_to_return/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
import attrs
from url_matcher import Patterns

from web_poet import Injectable, ItemPage, Returns, field, handle_urls, item_from_fields


@attrs.define
class Product:
name: str
price: float


@attrs.define
class ProductSimilar:
name: str
price: float


@attrs.define
class ProductMoreFields(Product):
brand: str


@attrs.define
class ProductLessFields:
BurnzZ marked this conversation as resolved.
Show resolved Hide resolved
name: str


@handle_urls("example.com")
class ProductPage(ItemPage[Product]):
"""A base PO to populate the Product item's fields."""

expected_instead_of = None
expected_patterns = Patterns(["example.com"])
expected_to_return = Product
expected_meta = {}

@field
def name(self) -> str:
return "name"

@field
def price(self) -> float:
return 12.99


@handle_urls("example.com", instead_of=ProductPage)
class ImprovedProductPage(ProductPage):
"""A custom PO inheriting from a base PO which alters some field values."""

expected_instead_of = ProductPage
expected_patterns = Patterns(["example.com"])
expected_to_return = Product
expected_meta = {}

@field
def name(self) -> str:
return "improved name"


@handle_urls("example.com", instead_of=ProductPage)
class SimilarProductPage(ProductPage, Returns[ProductSimilar]):
"""A custom PO inheriting from a base PO returning the same fields but in
a different item class.
"""

expected_instead_of = ProductPage
expected_patterns = Patterns(["example.com"])
expected_to_return = ProductSimilar
expected_meta = {}


@handle_urls("example.com", instead_of=ProductPage)
class MoreProductPage(ProductPage, Returns[ProductMoreFields]):
"""A custom PO inheriting from a base PO returning more items using a
different item class.
"""

expected_instead_of = ProductPage
expected_patterns = Patterns(["example.com"])
expected_to_return = ProductMoreFields
expected_meta = {}

@field
def brand(self) -> str:
return "brand"


@handle_urls("example.com", instead_of=ProductPage)
class LessProductPage(
ProductPage, Returns[ProductLessFields], skip_nonitem_fields=True
):
"""A custom PO inheriting from a base PO returning less items using a
different item class.
"""

expected_instead_of = ProductPage
expected_patterns = Patterns(["example.com"])
expected_to_return = ProductLessFields
expected_meta = {}

@field
def brand(self) -> str:
return "brand"


@handle_urls("example.com", instead_of=ProductPage, to_return=ProductSimilar)
class CustomProductPage(ProductPage, Returns[Product]):
"""A custom PO inheriting from a base PO returning the same fields but in
a different item class.

This PO is the same with ``SimilarProductPage`` but passes a ``to_return``
in the ``@handle_urls`` decorator.

This tests the case that the type inside ``Returns`` should be followed and
the ``to_return`` parameter from ``@handle_urls`` is ignored.
"""

expected_instead_of = ProductPage
expected_patterns = Patterns(["example.com"])
expected_to_return = Product
expected_meta = {}


@handle_urls("example.com", instead_of=ProductPage, to_return=ProductSimilar)
class CustomProductPageNoReturns(ProductPage):
"""Same case as with ``CustomProductPage`` but doesn't inherit from
``Returns[Product]``.
"""

expected_instead_of = ProductPage
expected_patterns = Patterns(["example.com"])
expected_to_return = Product
expected_meta = {}


@handle_urls("example.com", to_return=Product)
class CustomProductPageDataTypeOnly(Injectable):
"""A PO that doesn't inherit from ``ItemPage`` and ``WebPage`` which means
it doesn't inherit from the ``Returns`` class.

This tests the case that the ``to_return`` parameter in ``@handle_urls``
should properly use it in the rules.
"""

expected_instead_of = None
expected_patterns = Patterns(["example.com"])
expected_to_return = Product
expected_meta = {}

@field
def name(self) -> str:
return "name"

@field
def price(self) -> float:
return 12.99

async def to_item(self) -> Product:
return await item_from_fields(self, item_cls=Product)
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
54 changes: 54 additions & 0 deletions tests/test_fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@
import attrs
import pytest

from tests.po_lib_to_return import (
CustomProductPage,
CustomProductPageDataTypeOnly,
CustomProductPageNoReturns,
ImprovedProductPage,
LessProductPage,
MoreProductPage,
Product,
ProductLessFields,
ProductMoreFields,
ProductPage,
ProductSimilar,
SimilarProductPage,
)
from web_poet import (
HttpResponse,
Injectable,
Expand Down Expand Up @@ -368,3 +382,43 @@ def field_foo_cached(self):
assert page.field_foo == "foo"
assert page.field_foo_meta == "foo"
assert page.field_foo_cached == "foo"


@pytest.mark.asyncio
async def test_field_with_handle_urls() -> None:

page = ProductPage()
assert page.name == "name"
assert page.price == 12.99
assert await page.to_item() == Product(name="name", price=12.99)

page = ImprovedProductPage()
assert page.name == "improved name"
assert page.price == 12.99
assert await page.to_item() == Product(name="improved name", price=12.99)

page = SimilarProductPage()
assert page.name == "name"
assert page.price == 12.99
assert await page.to_item() == ProductSimilar(name="name", price=12.99)

page = MoreProductPage()
assert page.name == "name"
assert page.price == 12.99
assert page.brand == "brand"
assert await page.to_item() == ProductMoreFields(
name="name", price=12.99, brand="brand"
)

page = LessProductPage()
assert page.name == "name"
assert await page.to_item() == ProductLessFields(name="name")

for page in [ # type: ignore[assignment]
CustomProductPage(),
CustomProductPageNoReturns(),
CustomProductPageDataTypeOnly(),
]:
assert page.name == "name"
assert page.price == 12.99
assert await page.to_item() == Product(name="name", price=12.99)
Loading