From 677c1fef5be598d7a5cf63cd13e4171fd5dbeb7b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adri=C3=A1n=20Chaves?= <adrian@chaves.io>
Date: Thu, 25 Jan 2024 15:09:47 +0100
Subject: [PATCH] Refactor the additional request docs

---
 docs/frameworks/additional-requests.rst   |   8 +-
 docs/index.rst                            |   1 -
 docs/intro/overview.rst                   |   2 +
 docs/page-objects/additional-requests.rst | 857 +++-------------------
 docs/page-objects/input-validation.rst    | 118 ---
 docs/page-objects/inputs.rst              | 118 +++
 6 files changed, 212 insertions(+), 892 deletions(-)
 delete mode 100644 docs/page-objects/input-validation.rst
diff --git a/docs/frameworks/additional-requests.rst b/docs/frameworks/additional-requests.rst
index 7a5492f3..5d281317 100644
--- a/docs/frameworks/additional-requests.rst
+++ b/docs/frameworks/additional-requests.rst
@@ -140,10 +140,10 @@ syntax.
 Exception Handling
 ------------------
 
-In the previous :ref:`exception-handling` section, we can see how Page Object
-developers could use the exception classes built inside **web-poet** to handle
-various ways additional requests MAY fail. In this section, we'll see the
-rationale and ways the framework MUST be able to do that.
+Page Object developers could use the exception classes built inside
+**web-poet** to handle various ways additional requests MAY fail. In this
+section, we'll see the rationale and ways the framework MUST be able to do
+that.
 
 Rationale
 *********
diff --git a/docs/index.rst b/docs/index.rst
index 121ca257..f97a67db 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -28,7 +28,6 @@ web-poet
    page-objects/rules
    page-objects/fields
    page-objects/additional-requests
-   page-objects/input-validation
    page-objects/page-params
    page-objects/stats
    page-objects/testing
diff --git a/docs/intro/overview.rst b/docs/intro/overview.rst
index 003f2465..ded52901 100644
--- a/docs/intro/overview.rst
+++ b/docs/intro/overview.rst
@@ -1,3 +1,5 @@
+.. _overview:
+
 ========
 Overview
 ========
diff --git a/docs/page-objects/additional-requests.rst b/docs/page-objects/additional-requests.rst
index 73c381cc..ac82dd07 100644
--- a/docs/page-objects/additional-requests.rst
+++ b/docs/page-objects/additional-requests.rst
@@ -4,799 +4,114 @@
 Additional requests
 ===================
 
-Websites nowadays needs a lot of page interactions to display or load some key
-information. In most cases, these are done via AJAX requests. Some examples of these are:
+Some websites require page interactions to load some information, such as
+clicking a button, scrolling down or hovering on some element. These
+interactions usually trigger background requests that are then loaded using
+JavaScript.
 
-    * Clicking a button on a page to reveal other similar products.
-    * Clicking the `"Load More"` button to retrieve more images of a given item.
-    * Scrolling to the bottom of the page to load more items `(i.e. infinite scrolling)`.
-    * Hovering on a certain webpage element that reveals a tool-tip containing
-      additional page info.
+To extract such data, reproduce those requests using :class:`~.HttpClient`.
+Include :class:`~.HttpClient` among the :ref:`inputs <inputs>` of your
+:ref:`page object <page-objects>`, and use an asynchronous :ref:`field
+<fields-sync-async>` or method to call one of its methods. For example, you can
+call :meth:`HttpClient.execute <.HttpClient.execute>` with an
+:class:`~.HttpRequest` as input to get an :class:`~.HttpResponse` as output.
 
-As such, performing additional requests inside Page Objects are inevitable to
-properly extract data for some websites.
-
-.. warning::
-
-    Additional requests made inside a Page Object aren't meant to represent
-    the **Crawling Logic** at all. They are simply a low-level way to interact
-    with today's websites which relies on a lot of page interactions to display
-    its contents.
-
-.. _httprequest-example:
-
-HttpRequest
-===========
-
-Additional requests are defined using a simple data container that represents
-a generic HTTP Request: :class:`~.HttpRequest`. Here's an example:
-
-.. code-block:: python
-
-    import json
-    import web_poet
-
-    request = web_poet.HttpRequest(
-        url="https://www.api.example.com/product-pagination/",
-        method="POST",
-        headers={
-            "Content-Type": "application/json;charset=UTF-8"
-        },
-        body=json.dumps(
-            {
-                "Page": page_num,
-                "ProductID": product_id,
-            }
-        ).encode("utf-8"),
-    )
-
-    print(request.url)        # https://www.api.example.com/product-pagination/
-    print(type(request.url))  # <class 'web_poet.page_inputs.http.RequestUrl'>
-    print(request.method)     # POST
-
-    print(type(request.headers)  # <class 'web_poet.page_inputs.HttpRequestHeaders'>
-    print(request.headers)       # <HttpRequestHeaders('Content-Type': 'application/json;charset=UTF-8')>
-    print(request.headers.get("content-type"))    # application/json;charset=UTF-8
-    print(request.headers.get("does-not-exist"))  # None
-
-    print(type(request.body))  # <class 'web_poet.page_inputs.HttpRequestBody'>
-    print(request.body)        # b'{"Page": 1, "ProductID": 123}'
-
-There are a few things to take note here:
-
-    * ``method`` is simply a **string**.
-    * ``url`` is represented by the :class:`~.RequestUrl` class.
-    * ``headers`` is represented by the :class:`~.HttpRequestHeaders` class which
-      resembles a ``dict``-like interface. It supports case-insensitive header-key
-      lookups as well as multi-key storage.
-
-        * See :external:py:class:`multidict.CIMultiDict` for the set of features
-          since :class:`~.HttpRequestHeaders` simply inherits from it.
-
-    * ``body`` is represented by the :class:`~.HttpRequestBody` class which is
-      simply a subclass of the ``bytes`` class. Using the ``body`` param of
-      :class:`~.HttpRequest` needs to have an input argument in ``bytes``. In our
-      code example, we've converted it from ``str`` to ``bytes`` using the ``encode()``
-      string method.
-
-Most of the time though, what you'll be defining would be ``GET`` requests. Thus,
-it's perfectly fine to define them as:
-
-.. code-block:: python
-
-    import web_poet
-
-    request = web_poet.HttpRequest("https://api.example.com/product-info?id=123")
-
-    print(request.url)        # https://api.example.com/product-info?id=123
-    print(type(request.url))  # <class 'web_poet.page_inputs.http.RequestUrl'>
-    print(request.method)     # GET
-
-    print(type(request.headers)  # <class 'web_poet.page_inputs.HttpRequestHeaders'>
-    print(request.headers)       # <HttpRequestHeaders()>
-    print(request.headers.get("content-type"))    # None
-    print(request.headers.get("does-not-exist"))  # None
-
-    print(type(request.body))  # <class 'web_poet.page_inputs.HttpRequestBody'>
-    print(request.body)        # b''
-
-The key take aways are:
-
-    * The default value of ``method`` is ``GET``.
-    * ``headers`` still holds :class:`~.HttpRequestHeaders` which doesn't contain
-      anything.
-    * The same is true for ``body`` holding an empty :class:`~.HttpRequestBody`.
-
-Now that we know how :class:`~.HttpRequest` are structured, defining them doesn't
-execute the actual requests at all. In order to do so, we'll need to feed it into
-the :class:`~.HttpClient` which is defined in the next section (see
-:ref:`httpclient` tutorial section).
-
-HttpResponse
-============
-
-:class:`~.HttpResponse` is what comes after a :class:`~.HttpRequest` has been
-executed. It's typically returned by the methods from :class:`~.HttpClient` (see
-:ref:`httpclient` tutorial section) which holds the information regarding the response.
-
-:class:`~.HttpResponse` can also be used as a Page Object dependency,
-e.g. :class:`~.WebPage` uses it.
-
-.. note::
-
-    The additional requests are expected to perform redirections except when the
-    method is ``HEAD``. This means that the :class:`~.HttpResponse` that you'll
-    be receiving is already the end of the redirection trail.
-
-Let's check out an example to see its internals:
-
-.. code-block:: python
-
-    import web_poet
-
-    response = web_poet.HttpResponse(
-        url="https://www.api.example.com/product-pagination/",
-        body='{"data": "value 👍"}'.encode("utf-8"),
-        status=200,
-        headers={"Content-Type": "application/json;charset=UTF-8"}
-    )
-
-    print(response.url)        # https://www.api.example.com/product-pagination/
-    print(type(response.url))  # <class 'web_poet.page_inputs.http.ResponseUrl'>
-
-    print(response.body)           # b'{"data": "value \xf0\x9f\x91\x8d"}'
-    print(type(response.body))     # <class 'web_poet.page_inputs.HttpResponseBody'>
-
-    print(response.status)         # 200
-    print(type(response.status))   # <class 'int'>
-
-    print(response.headers)        # <HttpResponseHeaders('Content-Type': 'application/json;charset=UTF-8')>
-    print(type(response.headers))  # <class 'web_poet.page_inputs.HttpResponseHeaders'>
-    print(response.headers.get("content-type"))    # application/json;charset=UTF-8
-    print(response.headers.get("does-not-exist"))  # None
-
-    # These methods are also available:
-
-    print(response.body.declared_encoding())    # None
-    print(response.body.json())                 # {'data': 'value 👍'}
-
-    print(response.headers.declared_encoding()) # utf-8
-
-    print(response.encoding)                    # utf-8
-    print(response.text)                        # {"data": "value 👍"}
-    print(response.json())                      # {'data': 'value 👍'}
-
-Despite what the example above showcases, you won't be typically defining
-:class:`~.HttpResponse` yourself as it's the implementing framework (see
-:ref:`framework-additional-requests`) that's responsible for it. Nonetheless,
-it's important to understand its underlying structure in order to better access
-its methods.
-
-Here are the key take aways from the example above:
-
-    * ``status`` is simply an **int**.
-    * ``url`` is represented by the :class:`~.ResponseUrl` class.
-    * ``headers`` is represented by the :class:`~.HttpResponseHeaders` class.
-      It's similar to :class:`~.HttpRequestHeaders` where it inherits from
-      :external:py:class:`multidict.CIMultiDict`, granting it case-insensitive
-      header-key lookups as well as multi-key storage.
-
-        * The **encoding** can be derived using the :meth:`~.HttpResponseHeaders.declared_encoding`
-          method. In this example, it was retrieved from the ``Content-Type`` header.
-
-    * ``body`` is represented by the :class:`~.HttpResponseBody` class which is
-      simply a subclass of the ``bytes`` class. Using the ``body`` param of
-      :class:`~.HttpResponse` needs to have an input argument in ``bytes``. In our
-      code example, we've converted it from ``str`` to ``bytes`` using the ``encode()``
-      string method.
-
-        * Similar to the headers, the **encoding** can be derived using the
-          :meth:`~.HttpResponseBody.declared_encoding`. In this case, it returned
-          ``None`` since no encoding can be derived from the response body.
-        * A :meth:`~.HttpResponseBody.json` method is also available to conveniently
-          access decoded contents from JSON responses. It uses the derived **encoding**
-          to properly decode the contents like the 👍 emoji.
-
-    * The :class:`~.HttpResponse` class itself also have these convenient methods:
-
-        * The :meth:`~.HttpResponse.encoding` property method returns the proper
-          encoding of the response based on this hierarchy:
-
-            * user-specified encoding (`using the` ``_encoding`` `attribute`)
-            * BOM from the body
-            * header encodings
-            * body encodings
-
-        * Instead of accessing the raw bytes values `(which doesn't represent the
-          underlying content properly like the` 👍 `emoji)`, the :meth:`~.HttpResponse.text`
-          property method can be used which takes into account the derived **encoding**
-          when decoding the bytes value.
-        * The :meth:`~.HttpResponse.json` method is available as a shortcut to
-          :class:`~.HttpResponseBody`'s :meth:`~.HttpResponseBody.json` method.
-
-We've only explored a JSON response as a result from an additional request. Let's
-take a look at another example having an HTML response:
-
-.. code-block:: python
-
-    import web_poet
-
-    response = web_poet.HttpResponse(
-        url="https://www.api.example.com/product-pagination/",
-        body=(
-            '<html>'
-            '  <head>'
-            '    <title>Some page</title>'
-            '    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">'
-            '  </head>'
-            '  <body>Sample content 💯</body>'
-            '</html>'
-        ).encode("utf-8"),
-        status=200,
-        headers={}
-    )
-
-    print(response.headers.declared_encoding()) # None
-    print(response.body.declared_encoding())    # utf-8
-    print(response.encoding)                    # utf-8
-
-    print(response.body.json())  # JSONDecodeError
-    print(response.json())       # JSONDecodeError
-
-    print(type(response.selector))  # <class 'parsel.selector.Selector'>
-
-    print(response.selector.css("body ::text").get())     # Sample content 💯
-    print(response.css("body ::text").get())              # Sample content 💯
-
-    print(response.selector.xpath("//body/text()").get()) # Sample content 💯
-    print(response.xpath("//body/text()").get())          # Sample content 💯
-
-The key take aways for this example are:
-
-    * The **encoding** is derived from the body inside the ``meta`` tags since the
-      ``headers`` is empty for this example.
-    * Since we now have an HTML response, using :meth:`~.HttpResponseBody.json`
-      method would raise a ``JSONDecodeError`` as a JSON document cannot be
-      parsed from it.
-    * The :meth:`~.HttpResponse.selector` property is an instance of
-      :external:py:class:`parsel.selector.Selector`; there are also
-      :meth:`~.HttpResponse.css` and :meth:`~.HttpResponse.xpath` methods.
-
-        * Usually there's no need to use :meth:`~.HttpResponse.selector`, as
-          :meth:`~.HttpResponse.css` and :meth:`~.HttpResponse.xpath` are
-          available.
-
-
-.. _httpclient:
-
-HttpClient
-==========
-
-The main interface for executing additional requests would be :class:`~.HttpClient`.
-It also has full support for :mod:`asyncio` enabling developers to perform
-additional requests asynchronously using :py:func:`asyncio.gather`,
-:py:func:`asyncio.wait`, etc. This means that :mod:`asyncio` could be used anywhere
-inside the Page Object, including the :meth:`~.ItemPage.to_item` method.
-
-In the previous section, we've explored how :class:`~.HttpRequest` is defined.
-Let's see a few quick examples to see how to execute additional requests using
-the :class:`~.HttpClient`.
-
-Executing a HttpRequest instance
---------------------------------
-
-.. code-block:: python
-
-    import attrs
-    import web_poet
-    from web_poet import validates_input
-
-
-    @attrs.define
-    class ProductPage(web_poet.WebPage):
-        http: web_poet.HttpClient
-
-        @validates_input
-        async def to_item(self):
-            item = {
-                "url": self.url,
-                "name": self.css("#main h3.name ::text").get(),
-                "product_id": self.css("#product ::attr(product-id)").get(),
-            }
-
-            # Simulate clicking on a button that says "View All Images"
-            request = web_poet.HttpRequest(f"https://api.example.com/v2/images?id={item['product_id']}")
-            response: web_poet.HttpResponse = await self.http.execute(request)
-
-            item["images"] = response.css(".product-images img::attr(src)").getall()
-            return item
-
-As the example suggests, we're performing an additional request that allows us
-to extract more images in a product page that might not be otherwise be possible.
-This is because in order to do so, an additional button needs to be clicked
-which fetches the complete set of product images via AJAX.
-
-There are a few things to take note of this example:
-
-    * Recall from the :ref:`httprequest-example` tutorial section that the
-      default method is ``GET``. Thus, the ``method`` parameter can be omitted
-      for simple ``GET`` requests.
-    * We're now using the ``async/await`` syntax inside the :meth:`~.ItemPage.to_item`
-      method.
-    * The response from the additional request is of type :class:`~.HttpResponse`.
-
-.. tip::
-
-    Check out the :ref:`http-batch-request-example` tutorial section to see how
-    to execute a group of :class:`~.HttpRequest` in batch.
-
-Fortunately, there are already some quick shortcuts on how to perform single
-additional requests using the :meth:`~.HttpClient.request`, :meth:`~.HttpClient.get`,
-and :meth:`~.HttpClient.post` methods of :class:`~.HttpClient`. These already
-define the :class:`~.HttpRequest` and executes it as well.
-
-.. _httpclient-get-example:
-
-A simple ``GET`` request
-------------------------
-
-Let's use the example from the previous section and use the :meth:`~.HttpClient.get`
-method on it.
-
-.. code-block:: python
-
-    import attrs
-    import web_poet
-    from web_poet import validates_input
-
-
-    @attrs.define
-    class ProductPage(web_poet.WebPage):
-        http: web_poet.HttpClient
-
-        @validates_input
-        async def to_item(self):
-            item = {
-                "url": self.url,
-                "name": self.css("#main h3.name ::text").get(),
-                "product_id": self.css("#product ::attr(product-id)").get(),
-            }
-
-            # Simulates clicking on a button that says "View All Images"
-            response: web_poet.HttpResponse = await self.http.get(
-                f"https://api.example.com/v2/images?id={item['product_id']}"
-            )
-            item["images"] = response.css(".product-images img::attr(src)").getall()
-            return item
-
-There are a few things to take note in this example:
-
-    * A ``GET`` request can be done via :class:`~.HttpClient`'s
-      :meth:`~.HttpClient.get` method.
-    * There is no need create an instance of :class:`~.HttpRequest` when
-      :meth:`~.HttpClient.get` is used.
-
-.. _request-post-example:
-
-A ``POST`` request with `header` and `body`
--------------------------------------------
-
-Let's see another example which needs ``headers`` and ``body`` data to process
-additional requests.
-
-In this example, we'll paginate related items in a carousel. These are
-usually lazily loaded by the website to reduce the amount of information
-rendered in the DOM that might not otherwise be viewed by all users anyway.
-
-Thus, additional requests inside the Page Object are typically needed for it:
+For example, simulating a click on a button that loads product images could
+look like:
 
 .. code-block:: python
 
     import attrs
-    import web_poet
-    from web_poet import validates_input
+    from web_poet import HttpClient, HttpError, WebPage, field
+    from zyte_common_items import Product
 
 
     @attrs.define
-    class ProductPage(web_poet.WebPage):
-        http: web_poet.HttpClient
-
-        @validates_input
-        async def to_item(self):
-            item = {
-                "url": self.url,
-                "name": self.css("#main h3.name ::text").get(),
-                "product_id": self.css("#product ::attr(product-id)").get(),
-                "related_product_ids": self.parse_related_product_ids(self),
-            }
-
-            # Simulates "scrolling" through a carousel that loads related product items
-            response: web_poet.HttpResponse = await self.http.post(
-                url="https://www.api.example.com/related-products/",
-                headers={
-                    "Content-Type": "application/json;charset=UTF-8"
-                },
-                body=json.dumps(
-                    {
-                        "Page": 2,
-                        "ProductID": item["product_id"],
-                    }
-                ).encode("utf-8"),
-            )
-            item["related_product_ids"].extend(self.parse_related_product_ids(response))
-            return item
-
-        @staticmethod
-        def parse_related_product_ids(response_page) -> List[str]:
-            return response_page.css("#main .related-products ::attr(product-id)").getall()
-
-Here's the key takeaway in this example:
-
-    * Similar to :class:`~.HttpClient`'s :meth:`~.HttpClient.get` method,
-      a :meth:`~.HttpClient.post` method is also available. It is
-      often used to submit forms.
-
-Other Single Requests
----------------------
-
-The :meth:`~.HttpClient.get` and :meth:`~.HttpClient.post` methods are merely
-quick shortcuts for :meth:`~.HttpClient.request`:
-
-.. code-block:: python
-
-    client = HttpClient()
-
-    url = "https://api.example.com/v1/data"
-    headers = {"Content-Type": "application/json;charset=UTF-8"}
-    body = b'{"data": "value"}'
-
-    # These are the same:
-    response = await client.get(url)
-    response = await client.request(url, method="GET")
-
-    # The same goes for these:
-    response = await client.post(url, headers=headers, body=body)
-    response = await client.request(url, method="POST", headers=headers, body=body)
-
-Thus, apart from the common ``GET`` and ``POST`` HTTP methods, you can use
-:meth:`~.HttpClient.request` for them (`e.g.` ``HEAD``, ``PUT``, ``DELETE``, etc).
-
-.. _http-batch-request-example:
-
-Batch requests
---------------
-
-We can also choose to process requests by **batch** instead of sequentially or
-one by one (e.g. using :meth:`~.HttpClient.execute`). The :meth:`~.HttpClient.batch_execute`
-method can be used for this which accepts an arbitrary number of :class:`~.HttpRequest`
-instances.
-
-Let's modify the example in the previous section to see how it can be done.
-
-The difference for this code example from the previous section is that we're
-increasing the pagination from only the **2nd page** into the **10th page**.
-Instead of calling a single :meth:`~.HttpClient.post` method, we're creating a
-list of :class:`~.HttpRequest` to be executed in batch using the
-:meth:`~.HttpClient.batch_execute` method.
-
-.. code-block:: python
-
-    from typing import List
+    class ProductPage(WebPage[Product]):
+        http: HttpClient
 
-    import attrs
-    import web_poet
-    from web_poet import validates_input
+        @field
+        def productId(self):
+            return self.css("::attr(product-id)").get()
 
+        @field
+        async def images(self):
+            api_url = f"https://api.example.com/v2/images?id={self.productId}"
+            try:
+                response = await self.http.get(api_url)
+            except HttpError:
+                return []
+            else:
+                return response.css(".product-images img::attr(src)").getall()
 
-    @attrs.define
-    class ProductPage(web_poet.WebPage):
-        http: web_poet.HttpClient
-
-        default_pagination_limit = 10
-
-        @validates_input
-        async def to_item(self):
-            item = {
-                "url": self.url,
-                "name": self.css("#main h3.name ::text").get(),
-                "product_id": self.css("#product ::attr(product-id)").get(),
-                "related_product_ids": self.parse_related_product_ids(self),
-            }
-
-            requests: List[web_poet.HttpRequest] = [
-                self.create_request(item["product_id"], page_num=page_num)
-                for page_num in range(2, self.default_pagination_limit)
-            ]
-            responses: List[web_poet.HttpResponse] = await self.http.batch_execute(*requests)
-            related_product_ids = [
-                id_
-                for response in responses
-                for product_ids in self.parse_related_product_ids(response)
-                for id_ in product_ids
-            ]
+.. warning::
 
-            item["related_product_ids"].extend(related_product_ids)
-            return item
-
-        def create_request(self, product_id, page_num=2):
-            # Simulates "scrolling" through a carousel that loads related product items
-            return web_poet.HttpRequest(
-                url="https://www.api.example.com/product-pagination/",
-                method="POST",
-                headers={
-                    "Content-Type": "application/json;charset=UTF-8"
-                },
-                body=json.dumps(
-                    {
-                        "Page": page_num,
-                        "ProductID": product_id,
-                    }
-                ).encode("utf-8"),
-            )
-
-        @staticmethod
-        def parse_related_product_ids(response_page) -> List[str]:
-            return response_page.css("#main .related-products ::attr(product-id)").getall()
-
-The key takeaways for this example are:
-
-    * An :class:`~.HttpRequest` can be instantiated to represent a Generic HTTP Request.
-      It only contains the HTTP Request information for now and isn't executed yet.
-      This is useful for creating factory methods to help create requests without any
-      download execution at all.
-    * :class:`~.HttpClient` has a :meth:`~.HttpClient.batch_execute` method that
-      can process a list of :class:`~.HttpRequest` instances asynchronously together.
-
-.. tip::
-
-    The :meth:`~.HttpClient.batch_execute` method can execute multiple
-    :class:`~.HttpRequest` instances. For example, it could be a mixture
-    of ``GET`` and ``POST`` requests or even
-    representing requests for various parts of the page altogether.
-
-    Processing the additional requests in batch is useful since it takes advantage
-    of async execution which could be faster in certain cases `(assuming you're
-    allowed to perform HTTP requests in parallel)`.
-
-    Nonetheless, you can still use the :meth:`~.HttpClient.batch_execute` method
-    to execute a single :class:`~.HttpRequest` instance.
+    :class:`~.HttpClient` should only be used to handle the type of scenarios
+    mentioned above. Using :class:`~.HttpClient` for crawling logic would
+    defeat :ref:`the purpose of web-poet <overview>`.
 
 .. note::
 
-    The :meth:`~.HttpClient.batch_execute` method is a simple wrapper over
-    :py:func:`asyncio.gather`. Developers are free to use other functionalities
-    available inside :mod:`asyncio` to handle multiple requests.
-
-    For example, :py:func:`asyncio.as_completed` can be used to process the
-    first response from a group of requests as early as possible. However, the
-    order could be shuffled.
+    :meth:`HttpClient.execute <~.HttpClient.execute>` is expected to follow any
+    redirection except when the request method is ``HEAD``. This means that the
+    :class:`~.HttpResponse` that you get is already the end of any redirection
+    trail.
 
-.. _exception-handling:
-
-Handling Exceptions in Page Objects
-===================================
+Concurrent requests
+===================
 
-Let's have a look at how we could handle exceptions when performing additional
-requests inside Page Objects. For this example, let's improve the code snippet
-from the previous subsection named: :ref:`httpclient-get-example`.
+To send multiple requests concurrently, use :meth:`HttpClient.batch_execute
+<.HttpClient.batch_execute>`, which accepts any number of
+:class:`~.HttpRequest` instances as input, and returns :class:`~.HttpResponse`
+instances (and :class:`~.HttpError` instances when using
+``return_exceptions=True``) in the input order. For example:
 
 .. code-block:: python
 
-    import logging
-
     import attrs
-    import web_poet
-    from web_poet import validates_input
-
-    logger = logging.getLogger(__name__)
+    from web_poet import HttpClient, HttpError, HttpRequest, WebPage, field
+    from zyte_common_items import Product, ProductVariant
 
 
     @attrs.define
-    class ProductPage(web_poet.WebPage):
-        http: web_poet.HttpClient
-
-        @validates_input
-        async def to_item(self):
-            item = {
-                "url": self.url,
-                "name": self.css("#main h3.name ::text").get(),
-                "product_id": self.css("#product ::attr(product-id)").get(),
-            }
-
-            try:
-                # Simulates clicking on a button that says "View All Images"
-                response: web_poet.HttpResponse = await self.http.get(
-                    f"https://api.example.com/v2/images?id={item['product_id']}"
-                )
-            except web_poet.exceptions.HttpRequestError as err:
-                logger.warning(
-                    f"Unable to request images for product ID '{item['product_id']}' "
-                    f"using this request: {err.request}"
-                )
-            except web_poet.exceptions.HttpResponseError as err:
-                logger.warning(
-                    f"Received a {err.response.status} response status for product ID "
-                    f"'{item['product_id']}' from this URL: {err.request.url}"
-                )
-            else:
-                item["images"] = response.css(".product-images img::attr(src)").getall()
-
-            return item
-
-In this code example, the code became more resilient on cases where it wasn't
-possible to retrieve more images using the website's public API. It could be
-due to anything like `SSL errors`, `connection errors`, `page not found`, etc.
-
-Using :class:`~.HttpClient` to execute requests raises exceptions with the base
-class of type :class:`web_poet.exceptions.http.HttpError` irregardless of how
-the HTTP Downloader is implemented. From our example above, we could've simply
-used the :class:`web_poet.exceptions.http.HttpError` base error. However, it's
-ambiguous in the sense that the error could originate during the HTTP Request
-execution or when receiving the HTTP Response.
-
-A more specific :class:`web_poet.exceptions.http.HttpRequestError` exception is
-raised when the :class:`~.HttpRequest` was being handled while the
-:class:`web_poet.exceptions.http.HttpResponseError` is raised when receiving
-a response with an HTTP error. Notice from the example that the exceptions have
-the attributes like ``request`` and ``response`` which are respective instance of
-:class:`~.HttpRequest` and :class:`~.HttpResponse`. Accessing them would be useful
-to debug and log the problems.
-
-Note that :class:`web_poet.exceptions.http.HttpResponseError` only occurs when
-receiving responses with status codes in the ``400-5xx`` range. However, this
-behavior could be altered by using the ``allow_status`` param in the methods of
-:class:`~.HttpClient`.
-
-.. note::
-
-    In the future, more specific exceptions which inherits from the base
-    :class:`web_poet.exceptions.http.HttpError` exception would be available.
-    This should allow developers writing Page Objects to properly identify what
-    went wrong and act specifically based on the problem.
-
-Let's take another example when executing requests in batch as opposed to using
-single requests via these methods of the :class:`~.HttpClient`:
-:meth:`~.HttpClient.request`, :meth:`~.HttpClient.get`, and :meth:`~.HttpClient.post`.
-
-For this example, let's improve the code snippet from the previous subsection named:
-:ref:`http-batch-request-example`.
-
-.. code-block:: python
-
-    import logging
-    from typing import List, Union
+    class ProductPage(WebPage[Product]):
+        http: HttpClient
 
-    import attrs
-    import web_poet
-    from web_poet import validates_input
+        max_variants = 10
 
+        @field
+        def productId(self):
+            return self.css("::attr(product-id)").get()
 
-    @attrs.define
-    class ProductPage(web_poet.WebPage):
-        http: web_poet.HttpClient
-
-        default_pagination_limit = 10
-
-        @validates_input
-        async def to_item(self):
-            item = {
-                "url": self.url,
-                "name": self.css("#main h3.name ::text").get(),
-                "product_id": self.css("#product ::attr(product-id)").get(),
-                "related_product_ids": self.parse_related_product_ids(self),
-            }
-
-            requests: List[web_poet.HttpRequest] = [
-                self.create_request(item["product_id"], page_num=page_num)
-                for page_num in range(2, self.default_pagination_limit)
+        @field
+        async def variants(self):
+            requests = [
+                HttpRequest(f"https://example.com/api/variant/{self.productId}/{index}")
+                for index in range(self.max_variants)
             ]
-
-            try:
-                responses: List[web_poet.HttpResponse] = await self.http.batch_execute(*requests)
-            except web_poet.exceptions.HttpError:
-                logger.warning(
-                    f"Unable to request for more related products for product ID: {item['product_id']}"
-                )
-            else:
-                related_product_ids = []
-                for response in responses:
-                    related_product_ids.extend(
-                        [
-                            id_
-                            for product_ids in self.parse_related_product_ids(response)
-                            for id_ in product_ids
-                        ]
-                    )
-                item["related_product_ids"].extend(related_product_ids)
-
-            return item
-
-        def create_request(self, product_id, page_num=2):
-            # Simulates "scrolling" through a carousel that loads related product items
-            return web_poet.HttpRequest(
-                url="https://www.api.example.com/product-pagination/",
-                method="POST",
-                headers={
-                    "Content-Type": "application/json;charset=UTF-8"
-                },
-                body=json.dumps(
-                    {
-                        "Page": page_num,
-                        "ProductID": product_id,
-                    }
-                ).encode("utf-8"),
-            )
-
-        @staticmethod
-        def parse_related_product_ids(response_page) -> List[str]:
-            return response_page.css("#main .related-products ::attr(product-id)").getall()
-
-Handling exceptions using :meth:`~.HttpClient.batch_execute` remains largely the same.
-However, the main difference is that you may be wasting perfectly good responses just
-because a single request from the batch ruined it. Notice that we're using the base
-exception class of :class:`web_poet.exceptions.http.HttpError` to account for any
-type of errors, both during the HTTP Request execution and when receiving the
-response.
-
-An alternative approach would be salvaging good responses altogether. For example, you've
-sent out 10 :class:`~.HttpRequest` and only 1 of them had an exception during processing.
-You can still get the data from 9 of the :class:`~.HttpResponse` by passing the parameter
-``return_exceptions=True`` to :meth:`~.HttpClient.batch_execute`.
-
-This means that any exceptions raised during the HTTP execution are returned alongside any
-of the successful responses. The return type of :meth:`~.HttpClient.batch_execute` could
-be a mixture of :class:`~.HttpResponse` and :class:`web_poet.exceptions.http.HttpError`
-(*and its exception subclasses*).
-
-Here's an example:
-
-.. code-block:: python
-
-    # Revised code snippet from the to_item() method
-
-    requests: List[web_poet.HttpRequest] = [
-        self.create_request(item["product_id"], page_num=page_num)
-        for page_num in range(2, self.default_pagination_limit)
-    ]
-
-    responses: List[Union[web_poet.HttpResponse, web_poet.exceptions.HttpError]] = (
-        await self.http.batch_execute(*requests, return_exceptions=True)
-    )
-
-    related_product_ids = []
-    for i, response in enumerate(responses):
-        if isinstance(response, web_poet.exceptions.HttpError):
-            logger.warning(
-                f"Unable to request related products for product ID '{item['product_id']}' "
-                f"using this request: {requests[i]}. Reason: {response}."
-            )
-            continue
-        related_product_ids.extend(
-            [
-                id_
-                for product_ids in self.parse_related_product_ids(response)
-                for id_ in product_ids
+            responses = await self.http.batch_execute(*requests, return_exceptions=True)
+            return [
+                ProductVariant(color=response.css("::attr(color)").get())
+                for response in responses
+                if not isinstance(response, HttpError)
             ]
-        )
-
-    item["related_product_ids"].extend(related_product_ids)
-    return item
 
-From the example above, we're now checking the list of responses to see if any
-exceptions are included in it. If so, we're simply logging it down and ignoring
-it. In this way, perfectly good responses can still be processed through.
+You can alternatively use :mod:`asyncio` together with :class:`~.HttpClient` to
+handle multiple requests. For example, you can use :func:`asyncio.as_completed`
+to process the first response from a group of requests as early as possible.
 
 
 .. _retries-additional-requests:
 
-Retrying Additional Requests
+Retrying additional requests
 ============================
 
-When the bad response data comes from :ref:`additional requests
-<additional-requests>`, you must handle retries on your own.
+:ref:`Input validation <input-validation>` allows retrying all inputs from a
+page object. To retry only additional requests, you must handle retries on your
+own.
 
-The page object code is responsible for retrying additional requests until good
-response data is received, or until some maximum number of retries is exceeded.
+Your code is responsible for retrying additional requests until good response
+data is received, or until some maximum number of retries is exceeded.
 
 It is up to you to decide what the maximum number of retries should be for a
 given additional request, based on your experience with the target website.
@@ -812,26 +127,30 @@ times before giving up:
 
     import attrs
     from tenacity import retry, stop_after_attempt
-    from web_poet import HttpClient, WebPage, validates_input
+    from web_poet import HttpClient, HttpError, WebPage, field
+    from zyte_common_items import Product
+
 
     @attrs.define
-    class MyPage(WebPage):
+    class ProductPage(WebPage[Product]):
         http: HttpClient
 
+        @field
+        def productId(self):
+            return self.css("::attr(product-id)").get()
+
         @retry(stop=stop_after_attempt(3))
-        async def get_data(self):
-            response = await self.http.get("https://toscrape.com/")
-            if not response.css(".expected"):
-                raise ValueError
-            return response.css(".data").get()
-
-        @validates_input
-        async def to_item(self) -> dict:
+        async def get_images(self):
+            return self.http.get(f"https://api.example.com/v2/images?id={self.productId}")
+
+        @field
+        async def images(self):
             try:
-                data = await self.get_data()
-            except ValueError:
-                return {}
-            return {"data": data}
+                response = await self.get_images()
+            except HttpError:
+                return []
+            else:
+                return response.css(".product-images img::attr(src)").getall()
 
 If the reason your additional request fails is outdated or missing data from
 page object input, do not try to reproduce the request for that input as an
diff --git a/docs/page-objects/input-validation.rst b/docs/page-objects/input-validation.rst
deleted file mode 100644
index dba39651..00000000
--- a/docs/page-objects/input-validation.rst
+++ /dev/null
@@ -1,118 +0,0 @@
-.. _input-validation:
-
-================
-Input validation
-================
-
-Sometimes the data that your page object receives as input may be invalid.
-
-You can define a ``validate_input`` method in a page object class to check its
-input data and determine how to handle invalid input.
-
-``validate_input`` is called on the first execution of ``ItemPage.to_item()``
-or the first access to a :ref:`field <fields>`. In both cases validation
-happens early; in the case of fields, it happens before field evaluation.
-
-``validate_input`` is a synchronous method that expects no parameters, and its
-outcome may be any of the following:
-
--   Return ``None``, indicating that the input is valid.
-
-.. _retries-input:
-
--   Raise :exc:`~web_poet.exceptions.Retry`, indicating that the input
-    looks like the result of a temporary issue, and that trying to fetch
-    similar input again may result in valid input.
-
-    See also :ref:`retries-additional-requests`.
-
--   Raise :exc:`~web_poet.exceptions.UseFallback`, indicating that the
-    page object does not support the input, and that an alternative parsing
-    implementation should be tried instead.
-
-    For example, imagine you have a page object for website commerce.example,
-    and that commerce.example is built with a popular e-commerce web framework.
-    You could have a generic page object for products of websites using that
-    framework, ``FrameworkProductPage``, and a more specific page object for
-    commerce.example, ``EcommerceExampleProductPage``. If
-    ``EcommerceExampleProductPage`` cannot parse a product page, but it looks
-    like it might be a valid product page, you would raise
-    :exc:`~web_poet.exceptions.UseFallback` to try to parse the same product
-    page with ``FrameworkProductPage``, in case it works.
-
-    .. note:: web-poet does not dictate how to define or use an alternative
-              parsing implementation as fallback. It is up to web-poet
-              frameworks to choose how they implement fallback handling.
-
--   Return an item to override the output of the ``to_item`` method and of
-    fields.
-
-    For input not matching the expected type of data, returning an item that
-    indicates so is recommended.
-
-    For example, if your page object parses an e-commerce product, and the
-    input data corresponds to a list of products rather than a single product,
-    you could return a product item that somehow indicates that it is not a
-    valid product item, such as ``Product(is_valid=False)``.
-
-For example:
-
-.. code-block:: python
-
-   def validate_input(self):
-       if self.css('.product-id::text') is not None:
-           return
-       if self.css('.http-503-error'):
-           raise Retry()
-       if self.css('.product'):
-           raise UseFallback()
-       if self.css('.product-list'):
-           return Product(is_valid=False)
-
-You may use fields in your implementation of the ``validate_input`` method, but
-only synchronous fields are supported. For example:
-
-.. code-block:: python
-
-   class Page(WebPage[Item]):
-       def validate_input(self):
-           if not self.name:
-               raise UseFallback()
-
-       @field(cached=True)
-       def name(self):
-           return self.css(".product-name ::text")
-
-.. tip:: :ref:`Cache fields <field-caching>` used in the ``validate_input``
-         method, so that when they are used from ``to_item`` they are not
-         evaluated again.
-
-If you implement a custom ``to_item`` method, as long as you are inheriting
-from :class:`~web_poet.pages.ItemPage`, you can enable input validation
-decorating your custom ``to_item`` method with
-:func:`~web_poet.util.validates_input`:
-
-.. code-block:: python
-
-    from web_poet import validates_input
-
-    class Page(ItemPage[Item]):
-        @validates_input
-        async def to_item(self):
-            ...
-
-:exc:`~web_poet.exceptions.Retry` and :exc:`~web_poet.exceptions.UseFallback`
-may also be raised from the ``to_item`` method. This could come in handy, for
-example, if after you execute some asynchronous code, such as an
-:ref:`additional request <additional-requests>`, you find out that you need to
-retry the original request or use a fallback.
-
-
-Input Validation Exceptions
-===========================
-
-.. autoexception:: web_poet.exceptions.PageObjectAction
-
-.. autoexception:: web_poet.exceptions.Retry
-
-.. autoexception:: web_poet.exceptions.UseFallback
diff --git a/docs/page-objects/inputs.rst b/docs/page-objects/inputs.rst
index cb6ff600..0ab3ce95 100644
--- a/docs/page-objects/inputs.rst
+++ b/docs/page-objects/inputs.rst
@@ -84,3 +84,121 @@ You may define your own input classes if you are using a :ref:`framework
 
 However, note that custom input classes may make your :ref:`page object classes
 <page-object-classes>` less portable across frameworks.
+
+
+.. _input-validation:
+
+Input validation
+================
+
+Sometimes the data that your page object receives as input may be invalid.
+
+You can define a ``validate_input`` method in a page object class to check its
+input data and determine how to handle invalid input.
+
+``validate_input`` is called on the first execution of ``ItemPage.to_item()``
+or the first access to a :ref:`field <fields>`. In both cases validation
+happens early; in the case of fields, it happens before field evaluation.
+
+``validate_input`` is a synchronous method that expects no parameters, and its
+outcome may be any of the following:
+
+-   Return ``None``, indicating that the input is valid.
+
+.. _retries-input:
+
+-   Raise :exc:`~web_poet.exceptions.Retry`, indicating that the input
+    looks like the result of a temporary issue, and that trying to fetch
+    similar input again may result in valid input.
+
+    See also :ref:`retries-additional-requests`.
+
+-   Raise :exc:`~web_poet.exceptions.UseFallback`, indicating that the
+    page object does not support the input, and that an alternative parsing
+    implementation should be tried instead.
+
+    For example, imagine you have a page object for website commerce.example,
+    and that commerce.example is built with a popular e-commerce web framework.
+    You could have a generic page object for products of websites using that
+    framework, ``FrameworkProductPage``, and a more specific page object for
+    commerce.example, ``EcommerceExampleProductPage``. If
+    ``EcommerceExampleProductPage`` cannot parse a product page, but it looks
+    like it might be a valid product page, you would raise
+    :exc:`~web_poet.exceptions.UseFallback` to try to parse the same product
+    page with ``FrameworkProductPage``, in case it works.
+
+    .. note:: web-poet does not dictate how to define or use an alternative
+              parsing implementation as fallback. It is up to web-poet
+              frameworks to choose how they implement fallback handling.
+
+-   Return an item to override the output of the ``to_item`` method and of
+    fields.
+
+    For input not matching the expected type of data, returning an item that
+    indicates so is recommended.
+
+    For example, if your page object parses an e-commerce product, and the
+    input data corresponds to a list of products rather than a single product,
+    you could return a product item that somehow indicates that it is not a
+    valid product item, such as ``Product(is_valid=False)``.
+
+For example:
+
+.. code-block:: python
+
+   def validate_input(self):
+       if self.css('.product-id::text') is not None:
+           return
+       if self.css('.http-503-error'):
+           raise Retry()
+       if self.css('.product'):
+           raise UseFallback()
+       if self.css('.product-list'):
+           return Product(is_valid=False)
+
+You may use fields in your implementation of the ``validate_input`` method, but
+only synchronous fields are supported. For example:
+
+.. code-block:: python
+
+   class Page(WebPage[Item]):
+       def validate_input(self):
+           if not self.name:
+               raise UseFallback()
+
+       @field(cached=True)
+       def name(self):
+           return self.css(".product-name ::text")
+
+.. tip:: :ref:`Cache fields <field-caching>` used in the ``validate_input``
+         method, so that when they are used from ``to_item`` they are not
+         evaluated again.
+
+If you implement a custom ``to_item`` method, as long as you are inheriting
+from :class:`~web_poet.pages.ItemPage`, you can enable input validation
+decorating your custom ``to_item`` method with
+:func:`~web_poet.util.validates_input`:
+
+.. code-block:: python
+
+    from web_poet import validates_input
+
+    class Page(ItemPage[Item]):
+        @validates_input
+        async def to_item(self):
+            ...
+
+:exc:`~web_poet.exceptions.Retry` and :exc:`~web_poet.exceptions.UseFallback`
+may also be raised from the ``to_item`` method. This could come in handy, for
+example, if after you execute some asynchronous code, such as an
+:ref:`additional request <additional-requests>`, you find out that you need to
+retry the original request or use a fallback.
+
+Input validation exceptions
+---------------------------
+
+.. autoexception:: web_poet.exceptions.PageObjectAction
+
+.. autoexception:: web_poet.exceptions.Retry
+
+.. autoexception:: web_poet.exceptions.UseFallback