aws-emr
Here are 58 public repositories matching this topic...
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
-
Updated
Jun 5, 2024 - Python
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
-
Updated
Jun 13, 2022 - Python
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
-
Updated
May 14, 2022 - Python
Cloud-based AI / ML workflow and data application development framework
-
Updated
Aug 20, 2024 - Python
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
-
Updated
Jun 12, 2024 - Python
A collection of airflow sample workflows for data processing on aws
-
Updated
Dec 1, 2017 - Python
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
-
Updated
Oct 10, 2019 - Python
A cookiecutter template for working with PySpark on AWS EMR
-
Updated
Aug 30, 2020 - Python
My AWS Playground
-
Updated
Jun 18, 2024 - Python
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
-
Updated
Feb 25, 2021 - Python
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
-
Updated
Mar 17, 2023 - Python
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
-
Updated
Feb 25, 2023 - Python
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
-
Updated
Dec 8, 2022 - Python
Lambda to start EMR and run a map reduce job
-
Updated
Aug 16, 2019 - Python
This project analyzes the correlation between COVID-19 and the US aviation industry. By studying data on passenger/freight traffic and delays alongside COVID-19 trends, it provides insights into airline and passenger responses. The findings help airlines adapt to the pandemic's impact.
-
Updated
Jan 9, 2022 - Python
Improve this page
Add a description, image, and links to the aws-emr topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the aws-emr topic, visit your repo's landing page and select "manage topics."