K-MEANS-CLUSTERING-CLASSIFIER

K-means Clustering algorithm is used to classify,experimenting with different values of K to find the elbow point in the plot error vs K

Libraries used:
->numpy for using arrays and calculating sum and mean of array
->pandas for reading the excel file

Inputs:
->dataset.xlsx : excel sheet containing the dataset that need to be classified into clusters.

Outputs:
->For K=3 with given intial centroids:Plot displaying the classfied clusters of the dataset

->For K=1 : Plot displaying the classfied clusters of the dataset by taking random initial centroids
->For K=2:Plot displaying the classfied clusters of the dataset by taking random initial centroids
->For K=4:Plot displaying the classfied clusters of the dataset by taking random initial centroids
->For K=5:Plot displaying the classfied clusters of the dataset by taking random initial centroids
->For K=6:Plot displaying the classfied clusters of the dataset by taking random initial centroids
->For K=7:Plot displaying the classfied clusters of the dataset by taking random initial centroids

->For K=3:Plot displaying the classfied clusters of the dataset by taking random initial centroids
->Error vs K plot

User defined functions:
1.euclidean_distance
->Inputs:a,b
->Outputs:distances
->Euclidean distance is a function that takes in a and b and returns the distances array in which d[i][j] is the euclidean distances between a[i] and b[j]

2.kmeans
->Inputs:dataset,k,centroids
->Outputs:centroids
->k-means is an algorithm that takes in dataset and a constant k denoting the number of clusters and the intial centroids and returns the final centroids which define the clusters of data in the dataset which are similar to one another.

3.getLabels
->Inputs:dataset,centroids,k
->Outputs:labels
->Returns a labels array containning the cluster label to which each datapoint in the dataset belong to by evaluating the euclidean distance of the datapoint to the centroids and assigning the nearest centroid label to the datapoint.

4.getCentroids
->Inputs:dataset,labels,k,centroids
->Outputs:centroids
->Returns updated k centroids each of dimension 2.Each centroid is the geometric mean of the points that have that centroid's label.

5.should_stop_iterations
->Inputs:old_centroids,centroids
->Outputs:sum of the distance btwn the old and new centroids
->Returns 0 if k-means is done by checking the termination condition.K-means terminates if the centroids stop changing.Else returns the sum of the euclidean distance btwn the corresponding old and new centroids.

6.sum_of_squared_error
->Inputs:dataset,labels,centroids
->Outputs:error
->Computes the error by taking the sum of the square of the euclidean distance of the dataset point to the corresponding centroid assigned to it.

7.visualise_data
->Inputs:centroids,labels,dataset,fig_num
->Outputs:plot
->Visualise the dataset into the clusters formed as a resultant of appyling k-means clustering to the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
dataset.xlsx		dataset.xlsx
python_code.py		python_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-MEANS-CLUSTERING-CLASSIFIER

About

Releases

Packages

Languages

srilakshmi-thota/K-MEANS-CLUSTERING-CLASSIFIER

Folders and files

Latest commit

History

Repository files navigation

K-MEANS-CLUSTERING-CLASSIFIER

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages