Distributed Algorithms - K-Means

This project implements the K-Means algorithm (https://arxiv.org/pdf/1203.6402) in the context of distributed algorithms. It was developed as part of the course Physics of Data - Management and Analysis of Physics Datasets mod.B, academic year 2023-2024.

Authors

Alessio Tuscano
Chiara Tramarin

Project Description

The goal of this project is to implement and benchmark the K-means|| algorithm across a distributed system using Dask. We will develop an alternative version of K-means||, aiming to identify the optimal cluster configuration that maximizes the efficient use of available resources. The performance will be evaluated based on several metrics, including execution time, memory usage, and task distribution.

For benchmarking, we will use the KDD Cup 1999 Dataset (available here). The dataset will help assess how well the algorithm handles large-scale data, with a focus on scalability and resource optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
benchmark_results		benchmark_results
LICENSE		LICENSE
README.md		README.md
benchmark.ipynb		benchmark.ipynb
project_finale.ipynb		project_finale.ipynb
report_finale.ipynb		report_finale.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Algorithms - K-Means

Authors

Project Description

About

Releases

Packages

Languages

License

ChiaTrama/Management_and_Analysis_of_Physics_Dataset_B_project

Folders and files

Latest commit

History

Repository files navigation

Distributed Algorithms - K-Means

Authors

Project Description

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages