Hi. This is my personal project where I hope to aggregate all available data in the world about mma in the UFC league.
Ingestion System and Transformation Layer: Orchestrated Lambdas.
Scalibility: Lambdas allow for scale up as well as scale down.
Data Architecture: Two-Tier architecture leveraging a data lake and a data warehouse.
Orchestration: Airflow with certain handoffs to dbt cloud.
💻 Data Engineering Tools: Airflow, Docker, DBT,
☁️ Cloud: AWS (Lambda, S3, Redshift, ECR, MWAA)
✅ CI/Testing: DBT tests, Travis CI, Pytest
📊 Visualization: Tableau
📚 Libraries: pandas, boto3, awswrangler, beautifulsoup, psycop2g
🌈 Languages: Python/SQL (Redshift's version of Postgres)
🧰 Workflow Tools: Black (linter), VSCode, Datagrip
ToDo:
Dashboards not deployed due to TableauServer not having a free tier and my Redshift/MWAA costs lightly bankrupting me.- Moved it to a self hosted Metabase on elastic beanstalk
- finish terraform migration (Lambda, MWAA, + IAM and terraform cloud)
- Integrate CI/CD beyond local bash scripts (eg: move to github actions)