Skip to content

Selenium Robot goes to www.audible.com and scrape the data and reviews from the links given by user.

License

Notifications You must be signed in to change notification settings

rohit7044/Audible-Dataset-Generator-v1.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Audible Dataset Generator v1.0

Web Scraper using Selenium and Python to fetch audiobook details and required reviews for user from www.audible.com and convert to csv

Dataset Link Updated on 14-06-2021

Created on May 2021

forthebadge made-with-python

Python 3.7.8 Selenium 3.141.0 GitHub license GitHub forks GitHub stars GitHub issues Maintenance

Stargazers repo roster for @rohit7044/Audible-Dataset-Generator-v1.0 Forkers repo roster for @rohit7044/Audible-Dataset-Generator-v1.0

Requirements

Optional Requirements

  • Chropath.For finding specific xpath if needed.

Installation

Please go through the installations as stated in the requirements list above. Set the python path and chromedriver path as well. I am not using virtual environments(not a fan of 🐍)

FlowChart

Flowchart Click this badge for the process demo ? Workflow Video

Important Notes

  • The following robot needs the latest product list link and the audible.com website link
  • Reviews_Crawler function takes the number of reviews you want (in multiples of 10)
  • The show_more_open_times variable takes the number of times the showmore button should be clicked while taking reviews. Initially before clicking showmore button, there are 10 reviews, so every showmore button will generate 10 reviews. For Example - if I put 3 in show_more_open_times variable in main.py, the robot will click showmore button 2 times, generating 30 reviews(initially there are 10 reviews) and creating 30 review columns separately in csv.
  • The csv file is encoded in utf-8
  • Unfortunately this version(1.0) has no pause button nor multithreading, So the iteration of 1200 books or above will take some significant amount of time and if stopped it will again iterate from beginning, duplicating files into csv.
  • I have not tested the iteration over 1200 books, So any issue please ping me up.