Skip to content

Web scraping micro service to track new whiskey releases via email

Notifications You must be signed in to change notification settings

wheeler-dev/whiskey_release_radar

Repository files navigation

Whiskey_Release_Radar

A web scraping micro service to track new whiskey releases via email

About:

This application uses puppeteer to frequently check the available products of several different liquor online shops based in germany. Once the available products have been extracted they will be safed into a JSON file in the /storage subdirectory and another run will be queued. Products consist of a title and price attribute, as well as a link to the products detail page where it can be bought. On every subsequent run, the new product list will be compared to the previously stored one to aquire any newly released products. A list of those new releases will then be sent to the provided email adress.

Prerequisites:

Node and npm

This app needs to be authorized with a google account to send e-mails through it.
The gmail api needs to be activated for this account in the google api console.
A more detailed guide on how to setup an account for usage with google apis can be found here. After creating credentials, the generated credentials.json file needs to be placed at the root of this project.
Helpful tip: Setting your credentials.json "redirect_uris" to ["urn:ietf:wg:oauth:2.0:oob", "http://localhost"] for your desktop application may help, if you encountering redirect issues during the authentication process

How to run:

  • npm install
  • npm run start

How it works:

On start up you will be asked to provide the email adress that will receive emails, when new releases are available, on the command line.
Afterwards the application will try to authenticate with google and therefore print a sign in link to the console.

If you encounter the message: "Error loading client secret file"
Make sure you stored your google accounts credentials.json at the root of the project folder, as described above.

Follow the link, log into the google account, grant the requested access and copy the generated token back into the console to complete the app authentication process.
If a previously stored token is available and still valid, this step will be skipped.
From this point on, the application will keep track of new products available on a 10-minute intervall.

App Overview

The different shop classes contain the puppeteer scaping logic for each indivial shop. The shops parseArrivals function returns all the products available on the scraped page.

The file manager class compares the scraped data to the most recently saved data (if there is any) to determine which products are new arrivals, then saves the scraped data to a JSON file.

New arrivals will be passed to the email service and sent through the authenticated account to the provided email adress.


Currently supported shops:

Planned for future releases:

Adding additional shops:

Ready app for server deployment / production use:

  • implement more sophisticated error logging and handling, as of right now parsing errors can crash the whole application
  • implement some sort of information system that sends alerts if a shop hasn't had any new arrivals for a while, because its scraping logic stopped working for some reason
  • implement scraper running on a scheduled cron job

Additional functionality / Improvement ideas:

  • let the user define keywords and price range to filter the arrivals title and price attributes that are sent to them
  • improve release e-mail layout
  • upload scraped json data to a google sheet or database, gather price data
  • set up an express server and expose an api to subscribe to the bot
  • build a frontend/website to consume api
  • set up a twitter account for the bot to tweet new arrivals

About

Web scraping micro service to track new whiskey releases via email

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published