Divar Product Scraper

This Python script allows users to crawl and retrieve product listings from Divar, a popular Iranian marketplace website. The script fetches product details from Divar's Cloudflare Worker API, extracts key information from each product listing, and saves the data in a CSV file. The CSV file has dynamic headers, where each unique product title becomes a header column.

Features

Crawl Divar Product Listings: Retrieve product URLs based on search queries and city.
Fetch Detailed Product Information: Extract detailed data (e.g., title, value) for each product.
Dynamic CSV Export: Save the data into a CSV file with dynamic headers based on unique product titles.
Pagination Support: Supports pagination to retrieve multiple pages of results.

Requirements

Python 3.x
requests library (You can install it using pip install requests)

How It Works

Fetching URLs: The script fetches product URLs based on a search query and city using Divar's search API.
Fetching Product Details: For each URL, the product's unique tag is extracted and used to fetch detailed product information using Divar's product details API.
Saving Data: The product details are saved in a CSV file, with dynamic headers corresponding to unique product titles. If a title is missing for a product, null is placed in the CSV file.

How to Use

Step 1: Clone Repository

run the following command in terminal

git clone https://github.com/2077DevWave/divar_scraper.git

Step 2: Set Parameters

Set the following parameters in the script (divar_scraper.py) to customize the behavior:

search_query: The search term for products (e.g., "207" to find Peugeot 207 cars).
city_id: The city ID to search within (e.g., "1" for Tehran, "38" for Yasouj).
url_limit: The number of URLs to crawl. Each page can contain 20 to 30 URLs, depending on the results.
csv_file: The name of the CSV file where the product details will be saved.

Example:

search_query = "207"                    # Change this query as needed
city_id = "38"                          # Optional: Specify a different city ID if desired (38 = Yasouj, 1 = Tehran)
url_limit = 5                           # Set the number of URLs to crawl
csv_file = "divar_product_details.csv"  # Name of the CSV file to save the product information

Step 3: Run the Script

Once the parameters are set, run the script:

python divar_scraper.py

This will start the crawling process, fetch product details, and save them into the specified CSV file.

Example

For example, if you want to search for "Peugeot 207" in the city of Yasouj (city_id = "38") and limit the crawl to 5 URLs, set the parameters like this:

search_query = "207"    # Searching for Peugeot 207 cars
city_id = "38"          # Yasouj city ID
url_limit = 5           # Crawl 5 product URLs
csv_file = "divar_207.csv"  # Save results to divar_207.csv

Output

After running the script, the details of the crawled products will be saved in a CSV file, for example divar_product_details.csv. The CSV file will contain dynamic headers for each unique title (e.g., "کارکرد", "مدل (سال تولید)", "رنگ"), and the corresponding product values for each product.

Example CSV format:

کارکرد, مدل (سال تولید), رنگ, برند و تیپ
1, 1403, مشکی, پژو 207i اتوماتیک
2, 1402, سفید, پژو 207 اتوماتیک
null, 1401, نقره‌ای, پژو 206

If a product does not have a particular title (e.g., "کارکرد"), null will be placed in its place.

API Documentation

Divar Search API (URL Fetch)
- Endpoint: https://divar-search-page-extractor.sideco.ir/
- Returns a list of product URLs based on the search query, city ID, and pagination.
Parameters:
- query: The search term (e.g., "207").
- city_id: The city ID (e.g., "1" for Tehran).
- page: The page number to fetch.
- lastPostDate: The timestamp for pagination.
Example:
https://divar-search-page-extractor.sideco.ir/?city_id=1&query=207&page=1
Divar Product Details API (Details Fetch)
- Endpoint: https://divar-page-info.sideco.ir/?tag={tag}
- Retrieves detailed product information for a specific tag.
Parameters:
- tag: The product tag, extracted from the URL.
Example:
https://divar-page-info.sideco.ir/?tag=12345

Error Handling

The script handles connection errors and exceptions by using try-except blocks. If any request to the Divar API fails, an error message is printed, and the script continues with the next URL or product.

Notes

The script assumes that the Divar API returns consistent data structures. If there are any changes in the API response format, the script may need adjustments.
The number of URLs per page is capped at 50 for each API request. If the url_limit exceeds the total number of available URLs, the script will crawl as many URLs as possible.

License

This project is licensed under the MIT License - see the LICENSE file for details.

For more information about the APIs used in this script, visit:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
tests		tests
LICENSE		LICENSE
README.md		README.md
divar_scraper.py		divar_scraper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Divar Product Scraper

Features

Requirements

How It Works

How to Use

Step 1: Clone Repository

Step 2: Set Parameters

Step 3: Run the Script

Example

Output

API Documentation

Error Handling

Notes

License

About

Releases

Packages

Languages

License

2077DevWave/divar_scraper

Folders and files

Latest commit

History

Repository files navigation

Divar Product Scraper

Features

Requirements

How It Works

How to Use

Step 1: Clone Repository

Step 2: Set Parameters

Step 3: Run the Script

Example

Output

API Documentation

Error Handling

Notes

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages