Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving away from upcoming deprecated functionalities of web-poet==0.6.0 #89

Merged
merged 9 commits into from
Nov 21, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ TBR
* Provider for ``web_poet.ResponseUrl`` is added, which allows to access the
response URL in the page object. This triggers a download unlike the provider
for ``web_poet.RequestUrl``.
* Move from web-poet 0.5.0 to 0.6.0.
BurnzZ marked this conversation as resolved.
Show resolved Hide resolved

* Updates all examples in the docs and tests from the deprecated
``web_poet.ItemWebPage`` into ``web_poet.WebPage``.
* The Registry now uses ``web_poet.ApplyRule`` instead of
``web_poet.OverrideRule``.


0.5.1 (2022-07-28)
------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/intro/advanced-tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Suppose we have the following Page Object:


@attr.define
class ProductPage(web_poet.ItemWebPage):
class ProductPage(web_poet.WebPage):
http: web_poet.HttpClient

async def to_item(self):
Expand Down Expand Up @@ -110,7 +110,7 @@ This basically acts as a switch to update the behavior of the Page Object:


@attr.define
class ProductPage(web_poet.ItemWebPage):
class ProductPage(web_poet.WebPage):
http: web_poet.HttpClient
page_params: web_poet.PageParams

Expand Down
38 changes: 19 additions & 19 deletions docs/intro/basic-tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,10 @@ out of the spider class.

.. code-block:: python

from web_poet.pages import ItemWebPage
from web_poet.pages import WebPage


class BookPage(ItemWebPage):
class BookPage(WebPage):
"""Individual book page on books.toscrape.com website, e.g.
http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html
"""
Expand All @@ -93,10 +93,10 @@ extract a property from the ``to_item`` method:

.. code-block:: python

from web_poet.pages import ItemWebPage
from web_poet.pages import WebPage


class BookPage(ItemWebPage):
class BookPage(WebPage):
"""Individual book page on books.toscrape.com website"""

@property
Expand Down Expand Up @@ -245,11 +245,11 @@ At the end of our job, the spider should look like this:
.. code-block:: python

import scrapy
from web_poet.pages import ItemWebPage
from web_poet.pages import WebPage
from scrapy_poet import callback_for


class BookPage(ItemWebPage):
class BookPage(WebPage):
"""Individual book page on books.toscrape.com website, e.g.
http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html
"""
Expand Down Expand Up @@ -353,7 +353,7 @@ existing Page Objects as subclasses of them:

.. code-block:: python

from web_poet.pages import ItemWebPage, WebPage
from web_poet.pages import WebPage


# ------ Base page objects ------
Expand All @@ -364,7 +364,7 @@ existing Page Objects as subclasses of them:
return []


class BookPage(ItemWebPage):
class BookPage(WebPage):

def to_item(self):
return None
Expand Down Expand Up @@ -421,7 +421,7 @@ to implement new ones:

.. code-block:: python

from web_poet.pages import ItemWebPage, WebPage
from web_poet.pages import WebPage


class BPBookListPage(WebPage):
Expand All @@ -430,7 +430,7 @@ to implement new ones:
return self.css("article.post h4 a::attr(href)").getall()


class BPBookPage(ItemWebPage):
class BPBookPage(WebPage):

def to_item(self):
return {
Expand Down Expand Up @@ -466,21 +466,21 @@ For example, the pattern ``books.toscrape.com/cataloge/category/``
is accepted and it would restrict the override only to category pages.

It is even possible to configure more complex patterns by using the
:py:class:`web_poet.overrides.OverrideRule` class instead of a triplet in
:py:class:`web_poet.rules.ApplyRule` class instead of a triplet in
the configuration. Another way of declaring the earlier config
for ``SCRAPY_POET_OVERRIDES`` would be the following:

.. code-block:: python

from url_matcher import Patterns
from web_poet import OverrideRule
from web_poet import ApplyRule


SCRAPY_POET_OVERRIDES = [
OverrideRule(for_patterns=Patterns(["toscrape.com"]), use=BTSBookListPage, instead_of=BookListPage),
OverrideRule(for_patterns=Patterns(["toscrape.com"]), use=BTSBookPage, instead_of=BookPage),
OverrideRule(for_patterns=Patterns(["bookpage.com"]), use=BPBookListPage, instead_of=BookListPage),
OverrideRule(for_patterns=Patterns(["bookpage.com"]), use=BPBookPage, instead_of=BookPage),
ApplyRule(for_patterns=Patterns(["toscrape.com"]), use=BTSBookListPage, instead_of=BookListPage),
ApplyRule(for_patterns=Patterns(["toscrape.com"]), use=BTSBookPage, instead_of=BookPage),
ApplyRule(for_patterns=Patterns(["bookpage.com"]), use=BPBookListPage, instead_of=BookListPage),
ApplyRule(for_patterns=Patterns(["bookpage.com"]), use=BPBookPage, instead_of=BookPage),
]

As you can see, this could get verbose. The earlier tuple config simply offers
Expand All @@ -494,8 +494,8 @@ a shortcut to be more concise.
Manually defining overrides like this would be inconvenient, most
especially for larger projects. Fortunately, `web-poet`_ has a cool feature to
annotate Page Objects like :py:func:`web_poet.handle_urls` that would define
and store the :py:class:`web_poet.overrides.OverrideRule` for you. All of the
:py:class:`web_poet.overrides.OverrideRule` rules could then be simply read as:
and store the :py:class:`web_poet.rules.ApplyRule` for you. All of the
:py:class:`web_poet.rules.ApplyRule` rules could then be simply read as:

.. code:: python

Expand All @@ -505,7 +505,7 @@ and store the :py:class:`web_poet.overrides.OverrideRule` for you. All of the
# rules from other packages. Otherwise, it can be omitted.
# More info about this caveat on web-poet docs.
consume_modules("external_package_A", "another_ext_package.lib")
SCRAPY_POET_OVERRIDES = default_registry.get_overrides()
SCRAPY_POET_OVERRIDES = default_registry.get_rules()

For more info on this, you can refer to these docs:

Expand Down
26 changes: 13 additions & 13 deletions docs/overrides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ page.
- `Example 1 <https://github.com/scrapinghub/scrapy-poet/blob/master/example/example/spiders/books_04_overrides_01.py>`_:
rules using tuples
- `Example 2 <https://github.com/scrapinghub/scrapy-poet/blob/master/example/example/spiders/books_04_overrides_02.py>`_:
rules using tuples and :py:class:`web_poet.overrides.OverrideRule`
rules using tuples and :py:class:`web_poet.ApplyRule`
- `Example 3 <https://github.com/scrapinghub/scrapy-poet/blob/master/example/example/spiders/books_04_overrides_03.py>`_:
rules using :py:func:`web_poet.handle_urls` decorator and retrieving them
via :py:meth:`web_poet.overrides.PageObjectRegistry.get_overrides`
via :py:meth:`web_poet.PageObjectRegistry.get_rules`

Page Objects refinement
=======================
Expand All @@ -44,7 +44,7 @@ using the following Page Object:

.. code-block:: python

class ISBNBookPage(ItemWebPage):
class ISBNBookPage(WebPage):

def __init__(self, response: HttpResponse, book_page: BookPage):
super().__init__(response)
Expand Down Expand Up @@ -81,7 +81,7 @@ the obtained item with the ISBN from the page HTML.
.. code-block:: python

@attr.define
class ISBNBookPage(ItemWebPage):
class ISBNBookPage(WebPage):
book_page: BookPage

def to_item(self):
Expand All @@ -95,17 +95,17 @@ Overrides rules

The default way of configuring the override rules is using triplets
of the form (``url pattern``, ``override_type``, ``overridden_type``). But more
complex rules can be introduced if the class :py:class:`web_poet.overrides.OverrideRule`
complex rules can be introduced if the class :py:class:`web_poet.ApplyRule`
is used. The following example configures an override that is only applied for
book pages from ``books.toscrape.com``:

.. code-block:: python

from web_poet import OverrideRule
from web_poet import ApplyRule


SCRAPY_POET_OVERRIDES = [
OverrideRule(
ApplyRule(
for_patterns=Patterns(
include=["books.toscrape.com/cataloge/*index.html|"],
exclude=["/catalogue/category/"]),
Expand Down Expand Up @@ -155,7 +155,7 @@ for the domain ``toscrape.com``.

In order to configure the ``scrapy-poet`` overrides automatically
using these annotations, you can directly interact with `web-poet`_'s
``default_registry`` (an instance of :py:class:`web_poet.overrides.PageObjectRegistry`).
``default_registry`` (an instance of :py:class:`web_poet.PageObjectRegistry`).

For example:

Expand All @@ -169,15 +169,15 @@ For example:
consume_modules("external_package_A", "another_ext_package.lib")

# To get all of the Override Rules that were declared via annotations.
SCRAPY_POET_OVERRIDES = default_registry.get_overrides()
SCRAPY_POET_OVERRIDES = default_registry.get_rules()

The :py:meth:`web_poet.overrides.PageObjectRegistry.get_overrides` method of the
``default_registry`` above returns ``List[OverrideRule]`` that were declared
The :py:meth:`web_poet.PageObjectRegistry.get_rules` method of the
``default_registry`` above returns ``List[ApplyRule]`` that were declared
using `web-poet`_'s :py:func:`web_poet.handle_urls` annotation. This is much
more convenient that manually defining all of the :py:class:`web_poet.overrides.OverrideRule`.
more convenient that manually defining all of the :py:class:`web_poet.ApplyRule`.

Take note that since ``SCRAPY_POET_OVERRIDES`` is structured as
``List[OverrideRule]``, you can easily modify it later on if needed.
``List[ApplyRule]``, you can easily modify it later on if needed.

.. note::

Expand Down
4 changes: 2 additions & 2 deletions docs/providers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Creating providers
Providers are responsible for building dependencies needed by Injectable
objects. A good example would be the ``HttpResponseProvider``,
which builds and provides a ``web_poet.HttpResponse`` instance for Injectables
that need it, like the ``web_poet.ItemWebPage``.
that need it, like the ``web_poet.WebPage``.

.. code-block:: python

Expand Down Expand Up @@ -271,7 +271,7 @@ Page Object uses it, the request is not ignored, for example:
.. note::

The code above is just for example purposes. If you need to use ``Response``
instances in your Page Objects, use built-in ``ItemWebPage`` - it has
instances in your Page Objects, use built-in ``WebPage`` - it has
``response`` attribute with ``HttpResponse``; no additional configuration
is needed, as there is ``HttpResponseProvider`` enabled in ``scrapy-poet``
by default.
Expand Down
4 changes: 2 additions & 2 deletions example/example/spiders/books_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
BookPage is now independent of Scrapy.
"""
import scrapy
from web_poet import ItemWebPage
from web_poet import WebPage


class BookPage(ItemWebPage):
class BookPage(WebPage):
def to_item(self):
return {
"url": self.url,
Expand Down
4 changes: 2 additions & 2 deletions example/example/spiders/books_02_1.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
boilerplate.
"""
import scrapy
from web_poet import ItemWebPage
from web_poet import WebPage

from scrapy_poet import callback_for


class BookPage(ItemWebPage):
class BookPage(WebPage):
def to_item(self):
return {
"url": self.url,
Expand Down
4 changes: 2 additions & 2 deletions example/example/spiders/books_02_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@
it is better than defining callback explicitly.
"""
import scrapy
from web_poet import ItemWebPage
from web_poet import WebPage

from scrapy_poet import callback_for


class BookPage(ItemWebPage):
class BookPage(WebPage):
def to_item(self):
return {
"url": self.url,
Expand Down
4 changes: 2 additions & 2 deletions example/example/spiders/books_02_3.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
but it can be implemented, with Scrapy support.
"""
import scrapy
from web_poet import ItemWebPage
from web_poet import WebPage


class BookPage(ItemWebPage):
class BookPage(WebPage):
def to_item(self):
return {
"url": self.url,
Expand Down
4 changes: 2 additions & 2 deletions example/example/spiders/books_04.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Scrapy spider which uses Page Objects both for crawling and extraction.
"""
import scrapy
from web_poet import ItemWebPage, WebPage
from web_poet import WebPage

from scrapy_poet import callback_for

Expand All @@ -12,7 +12,7 @@ def book_urls(self):
return self.css(".image_container a::attr(href)").getall()


class BookPage(ItemWebPage):
class BookPage(WebPage):
def to_item(self):
return {
"url": self.url,
Expand Down
6 changes: 3 additions & 3 deletions example/example/spiders/books_04_overrides_01.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
The default configured PO logic contains the logic for books.toscrape.com
"""
import scrapy
from web_poet import ItemWebPage, WebPage
from web_poet import WebPage

from scrapy_poet import callback_for

Expand All @@ -18,7 +18,7 @@ def book_urls(self):
return self.css(".image_container a::attr(href)").getall()


class BookPage(ItemWebPage):
class BookPage(WebPage):
"""Logic to extract book info from pages like https://books.toscrape.com/catalogue/soumission_998/index.html"""

def to_item(self):
Expand All @@ -35,7 +35,7 @@ def book_urls(self):
return self.css("article.post h4 a::attr(href)").getall()


class BPBookPage(ItemWebPage):
class BPBookPage(WebPage):
"""Logic to extract from pages like https://bookpage.com/reviews/25879-laird-hunt-zorrie-fiction"""

def to_item(self):
Expand Down
13 changes: 6 additions & 7 deletions example/example/spiders/books_04_overrides_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
"""
import scrapy
from url_matcher import Patterns
from web_poet import ItemWebPage, WebPage
from web_poet.overrides import OverrideRule
from web_poet import WebPage
from web_poet.rules import ApplyRule

from scrapy_poet import callback_for

Expand All @@ -19,9 +19,8 @@ def book_urls(self):
return []


class BookPage(ItemWebPage):
def to_item(self):
return None
class BookPage(WebPage):
...


class BTSBookListPage(BookListPage):
Expand Down Expand Up @@ -67,12 +66,12 @@ class BooksSpider(scrapy.Spider):
("toscrape.com", BTSBookListPage, BookListPage),
("toscrape.com", BTSBookPage, BookPage),
kmike marked this conversation as resolved.
Show resolved Hide resolved
# We could also use the long-form version if we want to.
OverrideRule(
ApplyRule(
for_patterns=Patterns(["bookpage.com"]),
use=BPBookListPage,
instead_of=BookListPage,
),
OverrideRule(
ApplyRule(
for_patterns=Patterns(["bookpage.com"]),
use=BPBookPage,
instead_of=BookPage,
Expand Down
Loading