Skip to content

Commit

Permalink
Merge pull request #84 from scrapinghub/handle_urls-with-item
Browse files Browse the repository at this point in the history
@handle_urls() with item return type
  • Loading branch information
kmike authored Oct 27, 2022
2 parents f0e66ef + 2c570e2 commit 5adaa28
Show file tree
Hide file tree
Showing 28 changed files with 1,464 additions and 559 deletions.
2 changes: 2 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,7 @@ per-file-ignores =
# imports are there to expose submodule functions so they can be imported
# directly from that module
# F403: Ignore * imports in these files
# D102: Missing docstring in public method
web_poet/__init__.py:F401,F403
web_poet/page_inputs/__init__.py:F401,F403
tests/po_lib_to_return/__init__.py:D102
36 changes: 36 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,42 @@
Changelog
=========

TBD
---

* New ``ApplyRule`` class created by the ``@handle_urls`` decorator. This is
nearly identical with ``OverrideRule`` except:

* It's now accepting a ``to_return`` parameter which signifies the data
container class that the Page Object returns.
* Passing a string to ``for_patterns`` would auto-convert it into
``url_matcher.Patterns``.
* All arguments are now keyword-only except for ``for_patterns``.

* Modify the call signature and behavior of ``handle_urls``:

* New ``instead_of`` parameter which does the same thing as ``overrides``.
* The old ``overrides`` parameter is not required anymore as it's set for
deprecation.
* It sets a ``to_return`` parameter when creating ``ApplyRule`` based on the
declared item class in subclasses of ``web_poet.ItemPage``. It's also
possible to pass a ``to_return`` parameter on more advanced use cases.

* Documentation, test, and warning message improvements.

Deprecations:

* The ``overrides`` parameter from ``@handle_urls`` is now deprecated.
Use the ``instead_of`` parameter instead.
* The ``OverrideRule`` class is now deprecated. Use ``ApplyRule`` instead.
* The ``from_override_rules`` method of ``PageObjectRegistry`` is now deprecated.
Use ``from_apply_rules`` instead.
* The ``web_poet.overrides`` module is deprecated. Use ``web_poet.rules`` instead.
* The ``PageObjectRegistry.get_overrides`` method is deprecated.
Use ``PageObjectRegistry.get_rules`` instead.
* The ``PageObjectRegistry.search_overrides`` method is deprecated.
Use ``PageObjectRegistry.search_rules`` instead.

0.5.1 (2022-09-23)
------------------

Expand Down
14 changes: 7 additions & 7 deletions docs/advanced/additional-requests.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _`advanced-requests`:
.. _advanced-requests:

===================
Additional Requests
Expand Down Expand Up @@ -27,7 +27,7 @@ The key words "MUST”, "MUST NOT”, "REQUIRED”, "SHALL”, "SHALL NOT”, "S
"SHOULD NOT”, "RECOMMENDED”, "MAY”, and "OPTIONAL” in this document are to be
interpreted as described in RFC `2119 <https://www.ietf.org/rfc/rfc2119.txt>`_.

.. _`httprequest-example`:
.. _httprequest-example:

HttpRequest
===========
Expand Down Expand Up @@ -271,7 +271,7 @@ The key take aways for this example are:
available.


.. _`httpclient`:
.. _httpclient:

HttpClient
==========
Expand Down Expand Up @@ -337,7 +337,7 @@ additional requests using the :meth:`~.HttpClient.request`, :meth:`~.HttpClient.
and :meth:`~.HttpClient.post` methods of :class:`~.HttpClient`. These already
define the :class:`~.HttpRequest` and executes it as well.

.. _`httpclient-get-example`:
.. _httpclient-get-example:

A simple ``GET`` request
------------------------
Expand Down Expand Up @@ -376,7 +376,7 @@ There are a few things to take note in this example:
* There is no need create an instance of :class:`~.HttpRequest` when
:meth:`~.HttpClient.get` is used.

.. _`request-post-example`:
.. _request-post-example:

A ``POST`` request with `header` and `body`
-------------------------------------------
Expand Down Expand Up @@ -459,7 +459,7 @@ quick shortcuts for :meth:`~.HttpClient.request`:
Thus, apart from the common ``GET`` and ``POST`` HTTP methods, you can use
:meth:`~.HttpClient.request` for them (`e.g.` ``HEAD``, ``PUT``, ``DELETE``, etc).

.. _`http-batch-request-example`:
.. _http-batch-request-example:

Batch requests
--------------
Expand Down Expand Up @@ -567,7 +567,7 @@ The key takeaways for this example are:
first response from a group of requests as early as possible. However, the
order could be shuffled.

.. _`exception-handling`:
.. _exception-handling:

Handling Exceptions in Page Objects
===================================
Expand Down
14 changes: 8 additions & 6 deletions docs/advanced/fields.rst
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,9 @@ It's also possible to implement field cleaning and processing in ``to_item``
but in that case accessing a field directly will return the value without
processing, so it's preferable to use field processors instead.

Item classes
.. _item-classes:

Item Classes
------------

In all previous examples, ``to_item`` methods are returning ``dict``
Expand Down Expand Up @@ -220,7 +222,7 @@ its ``to_item()`` method starts to return item instances, instead
of ``dict`` instances. In the example above ``ProductPage.to_item`` method
returns ``Product`` instances.

Defining an Item class may be an overkill if you only have a single Page Object,
Defining an item class may be an overkill if you only have a single Page Object,
but item classes are of a great help when

* you need to extract data in the same format from multiple websites, or
Expand Down Expand Up @@ -265,8 +267,8 @@ indicating that a required argument is missing.

Without an item class, none of these errors are detected.

Changing Item type
~~~~~~~~~~~~~~~~~~
Changing Item Class
~~~~~~~~~~~~~~~~~~~

Let's say there is a Page Object implemented, which outputs some standard
item. Maybe there is a library of such Page Objects available. But for a
Expand Down Expand Up @@ -333,7 +335,7 @@ to the item:
# ...
Note how :class:`~.Returns` is used as one of the base classes of
``CustomFooPage``; it allows to change the item type returned by a page object.
``CustomFooPage``; it allows to change the item class returned by a page object.

Removing fields (as well as renaming) is a bit more tricky.

Expand Down Expand Up @@ -368,7 +370,7 @@ is passed, and ``name`` is the only field ``CustomItem`` supports.

To recap:

* Use ``Returns[NewItemType]`` to change the item type in a subclass.
* Use ``Returns[NewItemType]`` to change the item class in a subclass.
* Don't use ``skip_nonitem_fields=True`` when your Page Object corresponds
to an item exactly, or when you're only adding fields. This is a safe
approach, which allows to detect typos in field names, even for optional
Expand Down
6 changes: 3 additions & 3 deletions docs/api-reference.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _`api-reference`:
.. _api-reference:

=============
API Reference
Expand Down Expand Up @@ -81,7 +81,7 @@ Exceptions
:show-inheritance:
:members:

.. _`api-overrides`:
.. _api-overrides:

Overrides
=========
Expand All @@ -91,7 +91,7 @@ use cases and some examples.

.. autofunction:: web_poet.handle_urls

.. automodule:: web_poet.overrides
.. automodule:: web_poet.rules
:members:
:exclude-members: handle_urls

Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ and the motivation behind ``web-poet``, start with :ref:`from-ground-up`.
changelog
license

.. _`web-poet`: https://github.com/scrapinghub/web-poet
.. _web-poet: https://github.com/scrapinghub/web-poet
.. _Scrapy: https://scrapy.org/
.. _scrapy-poet: https://github.com/scrapinghub/scrapy-poet

2 changes: 1 addition & 1 deletion docs/intro/from-ground-up.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _`from-ground-up`:
.. _from-ground-up:

===========================
web-poet from the ground up
Expand Down
Loading

0 comments on commit 5adaa28

Please sign in to comment.