Skip to content

Commit

Permalink
WS-5350 Updated Link and Pagination Selector Information (#42)
Browse files Browse the repository at this point in the history
* WS-5350 Updated Link and Pagination Selector Information

* WS-5350 Updated Link Selector Description
  • Loading branch information
karinaMeldere authored Apr 8, 2024
1 parent 4b61c2d commit 1740571
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 15 deletions.
12 changes: 9 additions & 3 deletions docs/Selectors/Link selector.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,18 @@ to. If you are selecting multiple links then check *multiple* property.
using AJAX for data loading. Instead of using a link selector, you
should use [Pagination selector] [pagination-selector].

The *Link selector* can extract links from 4 types of sources:
The *Link selector* can extract links from 5 types of sources:

1. **Link** - reads the `href` attribute of an element. E.g. `<a href="https://example.com">`;
2. **Text** - reads the text content of an element. E.g. `<span>https://example.com</span>`;
3. **Attribute** - reads the attributes of an element and finds the link. E.g. `<a data-link="https://example.com">`;
4. **Script** - reads the scripted link in an attribute. E.g. `<a onclick="window.location='https://example.com'">`;
4. **Scripted link in attribute** - reads the scripted link in an attribute. E.g. `<a onclick="window.location='https://example.com'">`;
5. **Link from any script** - reads link from a script. E.g. `<a(window.location=, window.open)>`;

All Link Selector types except for 'Link from any script' only allow selecting elements using the point-and-click interface
although other elements can be selected by manually entering the CSS selector value as a selector.

The 'Link from any script' type allows any element to be selected using the point-and-click interface.

## Configuration options

Expand All @@ -29,7 +35,7 @@ The *Link selector* can extract links from 4 types of sources:
**Navigate through multiple levels of navigation**

For example an e-commerce site has multi level navigation -
`categories -> subcategories`. To scrape data from all categories and
`categories -> subcategories`. To scrape data from all categories and
subcategories you can create two *Link selectors*. One selector would select
category links and the other selector would select subcategory links that are
available in the category pages. The subcategory *Link selector* should be made
Expand Down
22 changes: 10 additions & 12 deletions docs/Selectors/Pagination selector.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,18 @@ selector.
on pagination buttons.

### Pagination type
* auto - automatically uses one of these pagination types: `link`,
`scripted link`, `attribute link` or `click multiple times on next/more
button`. For a better scraper performance, it is better to select a specific
pagination type instead.
* link - extracts pagination URL from anchor `href` attribute.
* scripted link - extracts URL from javascript statement within `href` or
* **Auto** - automatically uses one of these pagination types. The auto option is
recommended by default, with additional configuration needed when the auto
option does not recognize the element correctly by default.
* **Link** - extracts pagination URL from anchor `href` attribute.
* **Scripted link** - extracts URL from javascript statement within `href` or
`onclick` attribute.
* attribute link - extracts URL from HTML element attribute.
* text link - extracts URL from plain text.
* scripted link click simulation - extracts URL by clicking on the element and
capturing URL that would be loaded by javascript.
* click multiple times on next/more button - navigates through pagination pages
* **Attribute link** - extracts URL from HTML element attribute.
* **Text link** - extracts URL from plain text.
* **Link from any script** - reads link from a script (window.location=, window.open).
* **Click multiple times on next/more button** - navigates through pagination pages
by clicking on a button multiple times until no new records are scraped.
* click once on multiple buttons - navigates through pagination pages by
* **Click once on multiple buttons** - navigates through pagination pages by
clicking on each unique button once.

## Use cases
Expand Down

0 comments on commit 1740571

Please sign in to comment.