Skip to content

A platform for building, managing and remotely deploying web scrapers

License

Notifications You must be signed in to change notification settings

chrispalmo/auto-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

auto-scrape-logo auto-scrape

Auto-scrape is a platform for building, managing and remotely deploying web scrapers. It provides the "essential infrastructure" for web scraping while allowing developers to focus on writng Selenium web scraping scripts in a simple and familiar way.

It is built using the Flask framework and uses SQLAlchemy to interface with the SQL database of your choice.

Demo

GIF screenshots demonstrating the user interface in action here.

Features:

  • live progress logging
  • database for saving scraped data - no database experience required!
  • CSV export
  • multiple simultaneous scrapers
  • basic resource management
  • basic user authenticalion for remote deployments (see fea-simple-auth branch

Initial Project Setup

  1. Download chromedriver and place it in /autoscrape. Rename to chromedriver.
  2. Install dependencies: pip install -r requirements.txt
  3. Set environment variables:

Windows:

$env:AUTOSCRAPE_ADMIN_USERNAME="your_admin_username"
$env:AUTOSCRAPE_ADMIN_PASSWORD="your_admin_password"

MacOS / Linux:

export AUTOSCRAPE_ADMIN_USERNAME="your_admin_username"
export AUTOSCRAPE_ADMIN_PASSWORD="your_admin_password"

You could also store authentication details this way for scrapers run behind a paywall.

  1. Start scraping:

    • Windows: ./dev.ps1
    • MacoS / Linux: source ./dev.sh

About

A platform for building, managing and remotely deploying web scrapers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages