About

R Program that implement Unsupervised Learning Method - Clustering on "Snails" dataset.

Observations

K-Means Clustering

In order to find out the best value of k, average silhouette width was implemented for k=2 to k=15. Silhouette width provide an estimate of the average distance between clusters with the value of 1 being highest and -1 being lowest. If the value is closer to +1 then the cluster is considered a good cluster. Following is the plot between average shiloutte of observations and the number of clusters (k) :

The location of maximum was considered as the appropriate number of clusters. The number of clusters suitable for the dataset was chosen to be 15 with 80.5% variability explained by this clustering algorithm. Thus, k=15 is the optimal k-value to obtain a least within sum of squares and high explained variability.

Hierarchical Clustering - Euclidean and Manhattan distance

Using Hierarchical clustering full dataset with complete linkage with Euclidian dissimilarity measure, we can see that all animal locations spread to different clusters. On the other hand, using the Manhattan distance as the dissimilarity measure, we get different cluster composition with the same number of clusters. For example: With Euclidian distance, 7 observations are spread for Bermuda location while with Manhattan distance, 9 observations are spread over Bermuda. Moreover, cutting the tree at height at 15 gives only 2 clusters while with the Manhattan distance we get 18 clusters. Thus, the dendrogram obtained using Manhattan dissimilarity measure is longer than with Euclidian dissimilarity measure.

Dendogram with complete linkage method and Euclidean dissimilarity measure

Dendogram with complete linkage method and Manhattan dissimilarity measure

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Hierarchical-Manhattan.r		Hierarchical-Manhattan.r
Hierarchical.r		Hierarchical.r
K-Means.r		K-Means.r
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Observations

K-Means Clustering

Hierarchical Clustering - Euclidean and Manhattan distance

About

Releases

Packages

Languages

umangU/Clustering-Models

Folders and files

Latest commit

History

Repository files navigation

About

Observations

K-Means Clustering

Hierarchical Clustering - Euclidean and Manhattan distance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages