Cluster Based Knn Imputation

see all the results and experiments here

Note: All the code is in the main.ipynb notebook In addition you can read on the project in pdf attached to this repository.

Overview on the project

Filling null values in data observation is one of the major steps in data science pipeline. This is even more crucial for cases where there have small datasets. In our project we focus on the KNN imputation method, but instead to determining the same k which we got from the user for every sample, we use a clustering method to adjust for each sample with a nan value the cluster from which he is likely to originate and evaluate the clustering density on the cluster. Thus, as the density increases, the k will increase. we run our method on multiple datasets and compare it with classic KNN imputation, we found that our method improve the results in cases where the percentage of null values is high and there have high quality of clustering.

In order to gauge the quality of the method we use, we tested it on several different dataset:

Name	No.Columns	No. Samples	Explanation

Titanic	8	712	passengers details and survival status
Mobile	21	2000	mobile phone specifications and prices
Mnist	64	1797	handwritten digits gray scale images

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
datasets		datasets
expriments		expriments
instructions		instructions
src		src
.gitignore		.gitignore
Final-CBKnn_Imputation.pdf		Final-CBKnn_Imputation.pdf
README.md		README.md
main.ipynb		main.ipynb
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cluster Based Knn Imputation

Overview on the project

About

Releases 1

Packages

Contributors 3

Languages

YD5463/TabularDataProject

Folders and files

Latest commit

History

Repository files navigation

Cluster Based Knn Imputation

Overview on the project

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages