Skip to content

Commit

Permalink
Reword the page params introduction (#198)
Browse files Browse the repository at this point in the history
  • Loading branch information
Gallaecio authored Feb 1, 2024
1 parent d007f91 commit 9a58f03
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 36 deletions.
2 changes: 1 addition & 1 deletion docs/page-objects/frameworks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Frameworks
:ref:`Page objects <page-objects>` are not meant to be used in isolation with
web-poet. They are meant to be used with a web-poet framework.

A web-poet framework is a Python web scraping framework, library, of plugin
A web-poet framework is a Python web scraping framework, library, or plugin
that implements the :ref:`web-poet specification <spec>`.

At the moment, the only production-ready web-poet framework that exists is
Expand Down
61 changes: 26 additions & 35 deletions docs/page-objects/page-params.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,35 @@
Using page params
=================

In some cases, Page Objects might require additional information to be passed to
them. Such information can dictate the behavior of the Page Object or affect its
data entirely depending on the needs of the developer.
In some cases, :ref:`page object classes <page-objects>` might require or allow
parameters from the calling code, e.g. to change their behavior or make
optimizations.

If you can recall from the previous basic tutorials, one essential requirement of
Page Objects that inherit from :class:`~.WebPage` would
be :class:`~.HttpResponse`. This holds the HTTP response information that the
Page Object is trying to represent.

In order to standardize how to pass arbitrary information inside Page Objects,
we'll need to use :class:`~.PageParams` similar on how we use
:class:`~.HttpResponse` as a requirement to instantiate Page Objects:
To support parameters, add :class:`~.PageParams` to your :ref:`inputs
<inputs>`:

.. code-block:: python
import attrs
import web_poet
from web_poet import PageParams, WebPage
@attrs.define
class SomePage(web_poet.WebPage):
# The HttpResponse attribute is inherited from WebPage
page_params: web_poet.PageParams
class MyPage(WebPage):
page_params: PageParams
# Assume that it's constructed with the necessary arguments taken somewhere.
response = web_poet.HttpResponse(...)
In your page object class, you can read parameters from a :class:`~.PageParams`
object as you would from a :class:`dict`:

# It uses Python's dict interface.
page_params = web_poet.PageParams({"arbitrary_value": 1234, "cool": True})
page = SomePage(response=response, page_params=page_params)
.. code-block:: python
However, similar with :class:`~.HttpResponse`, developers using
:class:`~.PageParams` shouldn't care about how they are being passed into Page
Objects. This will depend on the framework that would use **web-poet**.
foo = self.page_params["foo"]
bar = self.page_params.get("bar", "default")
Let's checkout some examples on how to use it inside a Page Object.
The way the calling code sets those parameters depends on your :ref:`web-poet
framework <frameworks>`.

Controlling item values
-----------------------
Example: Controlling item values
================================

.. code-block:: python
Expand All @@ -69,7 +59,7 @@ Controlling item values
@staticmethod
def calculate_price_with_tax(item):
tax_rate = self.page_params.get("tax_rate") or self.default_tax_rate
tax_rate = self.page_params.get("tax_rate", self.default_tax_rate)
item["price_with_tax"] = item["price"] * (1 + tax_rate)
Expand All @@ -80,12 +70,13 @@ the **tax_rate** as optional information, notice that we also have a the
``default_tax_rate`` as a backup value just in case it's not available.


Controlling Page Object behavior
--------------------------------
Example: Controlling page object behavior
=========================================

Let's try an example wherein :class:`~.PageParams` is able to control how
:ref:`additional-requests` are being used. Specifically, we are going to use
:class:`~.PageParams` to control the number of paginations being made.
:ref:`additional requests <additional-requests>` are being used. Specifically,
we are going to use :class:`~.PageParams` to control the number of pages
visited.

.. code-block:: python
Expand All @@ -110,7 +101,7 @@ Let's try an example wherein :class:`~.PageParams` is able to control how
async def get_product_urls(self) -> List[str]:
# Simulates scrolling to the bottom of the page to load the next
# set of items in an "Infinite Scrolling" category list page.
max_pages = self.page_params.get("max_pages") or self.default_max_pages
max_pages = self.page_params.get("max_pages", self.default_max_pages)
requests = [
self.create_next_page_request(page_num)
for page_num in range(2, max_pages + 1)
Expand All @@ -134,5 +125,5 @@ Let's try an example wherein :class:`~.PageParams` is able to control how
From the example above, we can see how :class:`~.PageParams` is able to
arbitrarily limit the pagination behavior by passing an optional **max_pages**
info. Take note that a ``default_max_pages`` value is also present in the Page
Object in case the :class:`~.PageParams` instance did not provide it.
info. Take note that a ``default_max_pages`` value is also present in the page
object class in case the :class:`~.PageParams` instance did not provide it.

0 comments on commit 9a58f03

Please sign in to comment.