GitHub - oxylabs/seo-monitoring: Tutorial for building SEO Monitoring System using Python, Celery, and a SERP Scraper API

Scraping Experts - Building SEO Monitoring System using Python, Celery, and a SERP Scraper API

Video

Building an SEO Monitoring System with Python, Celery, and SERP Scraper API (The API is now a part of Web Scraper API)

Abstract

This solution is based on the data engineering principles of data ingestion and processing with a combination of remote calls for data enrichment.

The features are as follows:

Accepts CSV or XLSX files as an input for keyword SERP scraping
Moves input file to different directory after it was processed
Cleans the input keywords and prepares them to be submitted to Oxylabs Web Scraper API
Uses Celery to produce parallel requests to Web Scraper API (refer docker-compose for --autoscale parameter use)
Aggregates the responses in exact-same order as they were submitted to the Celery worker as a task
Retry & timeout added for the Celery tasks
Authenticates each request to Web Scraping API
Produces a new output file (CSV or XLSX) with the results from Web Scraper API
Continuously watches for a new input file to be added for processing

Installation

This project uses Python 3.10.x version and runs on virtual environment (venv), therefore make sure that the Python installation on your local system exists.

Credentials and configuration

To properly configure the application, copy-rename bundled dist.env to .env and update the parameters as needed (refer the docs at Oxylabs SERP Scraper API docs):

SERP configuration

SERP_TARGET=xxxxxxx (Refer to the Oxylabs Web Scraper API docs)
SERP_DOMAIN=xxxxxxx (Refer to the Oxylabs Web Scraper API docs)
SERP_PARSE_RESULT=True (Should Web Scraper API parse the results?)
SERP_LANGUAGE=en
SERP_PAGES=5 (how many pages to scrape)

Local directories and file watcher poll (using seconds)

INPUT_KEYWORDS="./input" (Where keyword input file will be put)
INPUT_PROCESSED="./input/processed" (Where processed keyword input file will be put)
OUTPUT_KEYWORDS="./output" (Where result output file will be put)
OUTPUT_FILE_TYPE=xlsx (What OUTPUT file type to use [CSV/XLSX])
OUTPUT_FILE_NAME=keywords_serps (What name to use for OUTPUT file)
INPUT_POLL_TIME=5 (How many seconds to wait before checking for new input files)

Web Scraper API authentication

OXY_SERPS_AUTH_USERNAME=XXXXX
OXY_SERPS_AUTH_PASSWORD=YYYYY

Local (Mac)

Checkout the scraping-experts-seo-monitoring source
Run: cd scraping-experts-seo-monitoring
Run: python3.10 -m venv venv
Run: source venv/bin/activate
Run: pip install --upgrade pip wheel setuptools
Run: pip install -r requirements.txt

Additionally, it is required to download internal python library artefacts to use the word tokenizer. To do this, after the project was installed, follow:

Run: cd scraping-experts-seo-monitoring
Run: source venv/bin/activate
Run: python (you will be prompted with Python CLI)
Run: import nltk; nltk.download('punkt')
Run: import nltk; nltk.download('stopwords')
Use CTRL+D to exit the Python CLI

Now you should be able to develop the project locally in your favourite IDE.

Docker (using Docker Compose)

Checkout the scraping-experts-seo-monitoring source
Run: cd scraping-experts-seo-monitoring
Run: docker-compose build
Run: docker-compose up -d --scale worker=5 && docker-compose logs -f
To stop the services running, exit the log watch mode with CTRL+C and run docker-compose down

INPUT file

The input keywords file must be placed at the root of /input directory, where the Python application will scan for new files and as soon as it finds (INPUT_POLL_TIME) the file it starts to process.

The application expects the XLSX file (or CSV) to have a following format:

XLSX

Keyword
sample1
sample2
other

CSV (with header)

keyword
sample1
sample2
other

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dist.env		dist.env
docker-compose.yml		docker-compose.yml
files.py		files.py
main.py		main.py
requirements.txt		requirements.txt
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Experts - Building SEO Monitoring System using Python, Celery, and a SERP Scraper API

Video

Abstract

Installation

Credentials and configuration

Local (Mac)

Docker (using Docker Compose)

INPUT file

About

Releases

Packages

Contributors 3

Languages

License

oxylabs/seo-monitoring

Folders and files

Latest commit

History

Repository files navigation

Scraping Experts - Building SEO Monitoring System using Python, Celery, and a SERP Scraper API

Video

Abstract

Installation

Credentials and configuration

Local (Mac)

Docker (using Docker Compose)

INPUT file

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages