Skip to content

Commit

Permalink
Merge f896a7d into 5167ed6
Browse files Browse the repository at this point in the history
  • Loading branch information
ankitjavalkar authored Jun 2, 2023
2 parents 5167ed6 + f896a7d commit 5eebbf3
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 12 deletions.
12 changes: 6 additions & 6 deletions docs/intro/advanced-tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Suppose we have the following Page Object:
# Simulates clicking on a button that says "View All Images"
response: web_poet.HttpResponse = await self.http.get(
f"https://api.example.com/v2/images?id={item['product_id']}"
f"https://api.toscrape.com/v2/images?id={item['product_id']}"
)
item["images"] = response.css(".product-images img::attr(src)").getall()
return item
Expand All @@ -85,8 +85,8 @@ It can be directly used inside the spider as:
def start_requests(self):
for url in [
"https://example.com/category/product/item?id=123",
"https://example.com/category/product/item?id=989",
"https://toscrape.com/category/product/item?id=123",
"https://toscrape.com/category/product/item?id=989",
]:
yield scrapy.Request(url, callback=self.parse)
Expand Down Expand Up @@ -128,7 +128,7 @@ This basically acts as a switch to update the behavior of the Page Object:
# Simulates clicking on a button that says "View All Images"
if self.page_params.get("enable_extracting_all_images")
response: web_poet.HttpResponse = await self.http.get(
f"https://api.example.com/v2/images?id={item['product_id']}"
f"https://api.toscrape.com/v2/images?id={item['product_id']}"
)
item["images"] = response.css(".product-images img::attr(src)").getall()
Expand Down Expand Up @@ -157,8 +157,8 @@ Let's see it in action:
}
start_urls = [
"https://example.com/category/product/item?id=123",
"https://example.com/category/product/item?id=989",
"https://toscrape.com/category/product/item?id=123",
"https://toscrape.com/category/product/item?id=989",
]
def start_requests(self):
Expand Down
12 changes: 6 additions & 6 deletions docs/rules-from-web-poet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ And then override it for a particular domain using ``settings.py``:
.. code-block:: python
SCRAPY_POET_RULES = [
ApplyRule("example.com", use=ISBNBookPage, instead_of=BookPage)
ApplyRule("toscrape.com", use=ISBNBookPage, instead_of=BookPage)
]
This new Page Object gets the original ``BookPage`` as dependency and enrich
Expand Down Expand Up @@ -211,7 +211,7 @@ Let's check out an example:
name: str
@handle_urls("example.com")
@handle_urls("toscrape.com")
@attrs.define
class ProductPage(WebPage[Product]):
Expand All @@ -225,7 +225,7 @@ Let's check out an example:
def start_requests(self):
yield scrapy.Request(
"https://example.com/products/some-product", self.parse
"https://toscrape.com/products/some-product", self.parse
)
# We can directly ask for the item here instead of the page object.
Expand All @@ -236,7 +236,7 @@ From this example, we can see that:

* Spider callbacks can directly ask for items as dependencies.
* The ``Product`` item instance directly comes from ``ProductPage``.
* This is made possible by the ``ApplyRule("example.com", use=ProductPage,
* This is made possible by the ``ApplyRule("toscrape.com", use=ProductPage,
to_return=Product)`` instance created from the ``@handle_urls`` decorator
on ``ProductPage``.

Expand All @@ -248,7 +248,7 @@ From this example, we can see that:

.. code-block:: python
@handle_urls("example.com")
@handle_urls("toscrape.com")
@attrs.define
class ProductPage(WebPage[Product]):
product_image_page: ProductImagePage
Expand All @@ -267,7 +267,7 @@ From this example, we can see that:
def start_requests(self):
yield scrapy.Request(
"https://example.com/products/some-product", self.parse
"https://toscrape.com/products/some-product", self.parse
)
async def parse(self, response: DummyResponse, product_page: ProductPage):
Expand Down

0 comments on commit 5eebbf3

Please sign in to comment.