K-Means is a clustring algorithm designed to partition unlabelled data into a certain number (that’s the “K”) of distinct groupings. K-Means clustering based on MapReduce speeds up the execution either on standalone or distributed computing. MapReduce has several advantages for speeding up K-means clustring algorithm:
- Parallel: it makes the best of multicore computer.
- Simpilicity: only consider Map and Reduce concepts.
- Distribution: Makes this algorithm possible to run on distributed mode.
- K-means: A Complete Introduction, online available: https://towardsdatascience.com/
- A Very Brief Introduction of MapReduce, online available: https://hci.standford.edu/courses/cs448g
- davide-cocoomini/KMeans-MapReduce, online available: https://github.com/davide-cocoomini/KMeans-MapReduce
- IRIS Flower Dataset, online available: https://www.kaggle.com/arshid/iris-flower-dataset
- Weather Dateset, online available: https://www.kaggle.com/prakharrathi25/weather-data-clustering-using-k-means/notebook
- Hao-Ying Cheng (Masker Tim)
- Email: t109598001@ntut.org.tw
- Affliation: National Taipei University of Technology
- Yueh Tang
- Email: t109598033@ntut.org.tw
- Affliation: National Taipei University of Technology