scheduled-web-scraper

This simple project is used to automatically generate backups of specific webpages.

Installation

The Python script alone is not sufficient for a repetitive action, so we need to set up a cron job. Of course, the cron job only runs when the computer is on, so the setup on a server is recommended.

Automatically (Linux)

Run the setup.sh. You may have to make the file executable first with chmod +x setup.sh.

Manually (Linux)

Since Git can't index empty folders and I don't want to work with a .gitkeep file, you need to manually create a folder called "archive" in this project with mkdir archiv.

Open the crontab file with crontab -e and create a new cron job by adding a line according to the given scheme (minute hour day month day-of-week command-line-to-execute). The time parameters can be freely selected. However, the web scraper script must be called in the command. Make sure you specify the correct path to the python file.

The new line could look like one of the following examples:

Run the web-scraper every full hour:
0 * * * * /usr/stupid-web-scraper/main.py
Run the web-scraper every 15 minutes from 8 am to 6 pm from Monday to Friday:
*/15 8-18 * * 1-6 /usr/stupid-web-scraper/main.py
_{See wiki.ubuntuusers.de/Cron for more information.}

Save the file and make sure cron is actually running with service cron status. If not you have to start the service with sudo service cron start.

Usage

Store the links to all webpages to be called in url_list.csv file. All other settings can be adjusted in config.py file.

Start the cron jobs

service cron start

Stop the cron jobs

service cron stop

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
setup.sh		setup.sh
url_list.csv		url_list.csv
web-scraper.py		web-scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scheduled-web-scraper

Installation

Automatically (Linux)

Manually (Linux)

Usage

Roadmap

About

Languages

License

Jocker271/scheduled-web-scraper

Folders and files

Latest commit

History

Repository files navigation

scheduled-web-scraper

Installation

Automatically (Linux)

Manually (Linux)

Usage

Roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Languages