GitHub - damzam/AdaptAPI: Take-home assignment for AdaptAPI

AdaptAPI Take Home Project

For this most excellent takehome exercise, I have elected to use the awesome library Scrapy, which lets you build fast, high-concurrency web spiders that extract and organize the data that you want and/or need extremely efficiently and easily while addressing login, authentication and session concerns.

I'm also using the Scrapy-Splash plugin which doesn't just run javascript to allow you to wait for dynamic content to load, it also has facilities for doing things like scrolling down for the emulated browser (running in a docker container) to load additional dynamic content.

Running the Splash engine

First, if you don't already have Docker Desktop, download here.

For Mac and Windows, you should then be able download the Splash image in a terminal with the following:

docker pull scrapinghub/splash

and then you should be able to run the rendering engine with the following:

docker run -it -p 8050:8050 --rm scrapinghub/splash

Running the spiders

Download the code

git clone git@github.com:damzam/AdaptAPI.git

Change directory into AdaptAPI

cd AdaptAPI

Create a virtual environment to protect your system python

python3 -m venv .env

Activate the virtual environment

source .env/bin/activate

Install dependencies

pip install scrapy scrapy-splash

cd into the scraper directory

cd scraper

Load the seed urls from input.json (copied from the take home assignment) and run the respective spiders to scrape the MOCK_INDEMNITY and PLACEHOLDER_CARRIER content and log it out to the console with the following command:

make

Clean Up

Remove the local repo
Terminate the docker process and remove the image

docker rmi scrapinghub/splash

And you're done!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
scraper		scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REVIEW.md		REVIEW.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdaptAPI Take Home Project

Running the Splash engine

Running the spiders

Clean Up

About

Releases

Packages

Languages

License

damzam/AdaptAPI

Folders and files

Latest commit

History

Repository files navigation

AdaptAPI Take Home Project

Running the Splash engine

Running the spiders

Clean Up

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages