Download any website from the Internet Archive Wayback Machine.
- Clone repo git clone https://github.com/pavelnovitsky/wayback-machine-download.git
- Setup write permissions on the "websites" folder
Run WayBack Downloader with the base url of the website you want to retrieve as a parameter (e.g., http://example.com):
php downloader.php -h http://example.com
Downloaded files are saved to the websites/{domain}/* directory. For this example it will be websites/example.com/
- -h, --host — mandatory parameter, base url of the downloaded website
- -t, --timestamp — optional parameter to set the earliest date of the Web Archive snapshots. WayBack Downloader won't download files added before the specified date. Timestamp format: yyyyMMddhhmmss
http://web.archive.org/web/20060716231334/http://example.com
php downloader.php -h http://example.com
php downloader.php --host=http://example.com
php downloader.php -h http://example.com -t 20060716231334
php downloader.php --host=http://example.com --timestamp=20060716231334
- Add full test coverage
- Add separated timestamp options "from" and "to"
- Add optional url filter (ex.: only directory, *.jpg, etc)
- Add results limiting
- Access Control support
You are welcome to contribute with pull requests
WayBack Downloader uses GitHub issues. If you have found bug, please create an issue.
This library is released under the terms of the MIT License.