Skip to content

This Python script allows users to crawl and retrieve product listings from Divar, a popular Iranian marketplace website. The script fetches product details from Divar's Cloudflare Worker API, extracts key information from each product listing, and saves the data in a CSV file.

License

Notifications You must be signed in to change notification settings

2077DevWave/divar_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Divar Product Scraper

This Python script allows users to crawl and retrieve product listings from Divar, a popular Iranian marketplace website. The script fetches product details from Divar's Cloudflare Worker API, extracts key information from each product listing, and saves the data in a CSV file. The CSV file has dynamic headers, where each unique product title becomes a header column.

Features

  • Crawl Divar Product Listings: Retrieve product URLs based on search queries and city.
  • Fetch Detailed Product Information: Extract detailed data (e.g., title, value) for each product.
  • Dynamic CSV Export: Save the data into a CSV file with dynamic headers based on unique product titles.
  • Pagination Support: Supports pagination to retrieve multiple pages of results.

Requirements

  • Python 3.x
  • requests library (You can install it using pip install requests)

How It Works

  1. Fetching URLs: The script fetches product URLs based on a search query and city using Divar's search API.
  2. Fetching Product Details: For each URL, the product's unique tag is extracted and used to fetch detailed product information using Divar's product details API.
  3. Saving Data: The product details are saved in a CSV file, with dynamic headers corresponding to unique product titles. If a title is missing for a product, null is placed in the CSV file.

How to Use

Step 1: Clone Repository

run the following command in terminal

git clone https://github.com/2077DevWave/divar_scraper.git

Step 2: Set Parameters

Set the following parameters in the script (divar_scraper.py) to customize the behavior:

  • search_query: The search term for products (e.g., "207" to find Peugeot 207 cars).
  • city_id: The city ID to search within (e.g., "1" for Tehran, "38" for Yasouj).
  • url_limit: The number of URLs to crawl. Each page can contain 20 to 30 URLs, depending on the results.
  • csv_file: The name of the CSV file where the product details will be saved.

Example:

search_query = "207"                    # Change this query as needed
city_id = "38"                          # Optional: Specify a different city ID if desired (38 = Yasouj, 1 = Tehran)
url_limit = 5                           # Set the number of URLs to crawl
csv_file = "divar_product_details.csv"  # Name of the CSV file to save the product information

Step 3: Run the Script

Once the parameters are set, run the script:

python divar_scraper.py

This will start the crawling process, fetch product details, and save them into the specified CSV file.

Example

For example, if you want to search for "Peugeot 207" in the city of Yasouj (city_id = "38") and limit the crawl to 5 URLs, set the parameters like this:

search_query = "207"    # Searching for Peugeot 207 cars
city_id = "38"          # Yasouj city ID
url_limit = 5           # Crawl 5 product URLs
csv_file = "divar_207.csv"  # Save results to divar_207.csv

Output

After running the script, the details of the crawled products will be saved in a CSV file, for example divar_product_details.csv. The CSV file will contain dynamic headers for each unique title (e.g., "کارکرد", "مدل (سال تولید)", "رنگ"), and the corresponding product values for each product.

Example CSV format:

کارکرد, مدل (سال تولید), رنگ, برند و تیپ
1, 1403, مشکی, پژو 207i اتوماتیک
2, 1402, سفید, پژو 207 اتوماتیک
null, 1401, نقره‌ای, پژو 206
  • If a product does not have a particular title (e.g., "کارکرد"), null will be placed in its place.

API Documentation

  1. Divar Search API (URL Fetch)

    • Endpoint: https://divar-search-page-extractor.sideco.ir/
    • Returns a list of product URLs based on the search query, city ID, and pagination.

    Parameters:

    • query: The search term (e.g., "207").
    • city_id: The city ID (e.g., "1" for Tehran).
    • page: The page number to fetch.
    • lastPostDate: The timestamp for pagination.

    Example:
    https://divar-search-page-extractor.sideco.ir/?city_id=1&query=207&page=1

  2. Divar Product Details API (Details Fetch)

    • Endpoint: https://divar-page-info.sideco.ir/?tag={tag}
    • Retrieves detailed product information for a specific tag.

    Parameters:

    • tag: The product tag, extracted from the URL.

    Example:
    https://divar-page-info.sideco.ir/?tag=12345

Error Handling

The script handles connection errors and exceptions by using try-except blocks. If any request to the Divar API fails, an error message is printed, and the script continues with the next URL or product.

Notes

  • The script assumes that the Divar API returns consistent data structures. If there are any changes in the API response format, the script may need adjustments.
  • The number of URLs per page is capped at 50 for each API request. If the url_limit exceeds the total number of available URLs, the script will crawl as many URLs as possible.

License

This project is licensed under the MIT License - see the LICENSE file for details.


For more information about the APIs used in this script, visit:

About

This Python script allows users to crawl and retrieve product listings from Divar, a popular Iranian marketplace website. The script fetches product details from Divar's Cloudflare Worker API, extracts key information from each product listing, and saves the data in a CSV file.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages