Simple PHP scraper/crawler written for PHP command line
Hopefully it will help anyone out there... Enjoy ;)
- PHP command line utility (comes with PHP download no need to install web server)
- Add
php
binary orphp.exe
path to PATH environment variable
string $url
Starting URLstring $crawl_regex
Regular expression that will be used for link crawlingstring $scrape_regex
Regular expression that will be used for data scrapinginteger $level
Used for recursion, use 0 when calling functionstring $out_file
Name of CSV file to export tointeger $max_level
Maximum levels or depth to crawl intostring $domain
(Optional) Used for recursion, use "" when calling functioninteger $max_retries
(Optional) Number of HTTP retries when timeouts or errors occur (default 3)boolean $use_cache
(Optional) True to cache web pages for fast extraction after re-running the script
- Open terminal (cmd, PowerShell or Git Bash for Windows)
- Change directory to script directory
- Run
php
to start scripting mode - Run the scrape function using your required parameters
<?php include 'php_scraper.php';
scrape("https://www.google.com/", "test", "test", 0, "output.csv", 20); ?>
- Press Ctrl+Z then enter to run
To know more about the regular expressions used check my tutorial
I encourage you all to contribute into this simple project to make better and more usable.