The repository contains an ongoing project to collect data from Ariana Grande's webstore, hosted by Shopify. Data was collected on a serverless AWS framework, using AWS Lambda, Amazon S3, and Amazon Eventbridge, from 6 March 2024 until 7 April 2024. Data modeling and predictive analytics is the next step.
- Amazon Eventbridge rules trigger AWS Lambda every five minutes, invoking
shopify_scrape.py
. shopify_scrape.py
pulls raw data both from the storefront and from the public-facingproducts.json
file, which is used for every Shopify storefront. The code then formats the data before passing it tolambda_function.py
.lambda_function.py
fetches an existing CSV file out of a connected S3 bucket. The Lambda appends the new data to the CSV file before putting it back in the bucket, replacing the original.- If the main CSV file becomes larger than 50 MB, the Lambda changes the name of the file and archives it in the same S3 bucket. It then creates a new CSV file for data to be appended to.
- data: contains all data in this repository
- raw: contains all raw webscraping data from AWS
- figures: contains project pngs, including architecture diagram (courtesy of Lucidchart)
- src: contains all Python modules and scripts used in project, including the Lambda function