Skip to content

Kmeans with/without PCA. Examining the effect of dimensionality reduction on model

Notifications You must be signed in to change notification settings

sahil0094/Kmeans-with-PCA

Repository files navigation

Kmeans-with-PCA on Marine Dataset

Kmeans with/without PCA. Examining the effect of dimensionality reduction on model

Data Set Information:

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

Attribute Information:

Given is the attribute name, attribute type, the measurement unit and a brief description. The number of rings is the value to predict: either as a continuous value or as a classification problem.

Name / Data Type / Measurement Unit / Description

  • Sex / nominal / -- / M, F, and I (infant)
  • Length / continuous / mm / Longest shell measurement
  • Diameter / continuous / mm / perpendicular to length
  • Height / continuous / mm / with meat in shell
  • Whole weight / continuous / grams / whole abalone
  • Shucked weight / continuous / grams / weight of meat
  • Viscera weight / continuous / grams / gut weight (after bleeding)
  • Shell weight / continuous / grams / after being dried
  • Rings / integer / -- / +1.5 gives the age in years

Approach

We first applied Kmeans directly on dataset. We found 3 as the optimal number of clusters and got the inertia to be 9922.820. Then we applied PCA as there was high multicollinearity and again applied KMeans using single PC and got the inertia to be 4786.761413.

About

Kmeans with/without PCA. Examining the effect of dimensionality reduction on model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published