A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
-
Updated
Dec 25, 2022 - Jupyter Notebook
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
Implemented the parallelized version of k-means clustering algorithm in Spark and assess its efficiency using a real-world dataset.
Add a description, image, and links to the bigdataprocessing topic page so that developers can more easily learn about it.
To associate your repository with the bigdataprocessing topic, visit your repo's landing page and select "manage topics."