KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

ChiaTrama · 2024-10-03T09:41:47Z

Hello,

As discussed in this topic on Dask's forum, my colleague and I compared in a distributed environment the dask-ml implementation of the KMeans class with our own implementation. During the comparison, we observed that the dask-ml initialization doesn't appear to use weights during the centroid re-clustering phase.

In the current dask-ml KMeans implementation, the standard KMeans algorithm is used for centroid re-clustering. In contrast, we incorporated weights into two areas:

KMeans++ initialization.
Weighted average during centroid re-clustering.

Although our implementation is less efficient than dask-ml in terms of execution time, we achieved better results when clustering a blob dataset, likely due to a reduction in the number of clustering iterations rather than direct code optimizations.

If you're interested, feel free to review our repository for further details on our approach:
GitHub Repository.

Thank you for considering this issue.

Best regards,
Chiara

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2024-10-07T11:55:22Z

Thanks for sharing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

ChiaTrama commented Oct 3, 2024

TomAugspurger commented Oct 7, 2024

KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

Comments

ChiaTrama commented Oct 3, 2024

TomAugspurger commented Oct 7, 2024