Agglomerative hierarchical clustering algorithm from scratch (i.e. without advance libraries such as Numpy, Pandas, Scikit-learn, etc.)
During the clustering process, we iteratively aggregate the most similar two clusters, until there are
The similarity of two clusters
The smaller the distance is, the more similar the two clusters are.
In the equations d()
, is a distance measure between two data points, i.e. Euclidean distance, defined by:
where p_i
, q_i
are dimensions of p
, q
python main.py -d sample_input.txt -k 4 -m 0