Skip to content
/ UFC Public

an ETL pipeline that aims to one day aggregate all measurements in the UFC

Notifications You must be signed in to change notification settings

chamley/UFC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Status:

Build Status Github last-commit License: MIT

Hi. This is my personal project where I hope to aggregate all available data in the world about mma in the UFC league.

Ingestion System and Transformation Layer: Orchestrated Lambdas.

Scalibility: Lambdas allow for scale up as well as scale down.

Data Architecture: Two-Tier architecture leveraging a data lake and a data warehouse.

Orchestration: Airflow with certain handoffs to dbt cloud.

Stack

💻 Data Engineering Tools: Airflow, Docker, DBT,

☁️ Cloud: AWS (Lambda, S3, Redshift, ECR, MWAA)

✅ CI/Testing: DBT tests, Travis CI, Pytest

📊 Visualization: Tableau

📚 Libraries: pandas, boto3, awswrangler, beautifulsoup, psycop2g

🌈 Languages: Python/SQL (Redshift's version of Postgres)

🧰 Workflow Tools: Black (linter), VSCode, Datagrip

data architecture Example Dashboard 1 Example Dashboard 2

ToDo:

  • Dashboards not deployed due to TableauServer not having a free tier and my Redshift/MWAA costs lightly bankrupting me.
    • Moved it to a self hosted Metabase on elastic beanstalk
  • finish terraform migration (Lambda, MWAA, + IAM and terraform cloud)
  • Integrate CI/CD beyond local bash scripts (eg: move to github actions)

About

an ETL pipeline that aims to one day aggregate all measurements in the UFC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages