Skip to content

Latest commit

 

History

History
45 lines (34 loc) · 2.14 KB

README.md

File metadata and controls

45 lines (34 loc) · 2.14 KB

About Dataset

Context

This classic dataset contains the prices and other attributes of almost 54,000 diamonds.

Content

  • price price in US dollars ($326--$18,823)
  • carat weight of the diamond (0.2--5.01)
  • cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
  • color diamond colour, from J (worst) to D (best)
  • clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
  • x length in mm (0--10.74)
  • y width in mm (0--58.9)
  • z depth in mm (0--31.8)
  • depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
  • table width of top of diamond relative to widest point (43--95)

Importing Libraries

image

Loading Data

image image

Data Preprocessing

Step involved in Data Preprocessing

  • Data Cleaning
  • Identifying and removing outliers
  • Encoding categorical variables

image

The first column is an index ("Unnamed: 0") and thus we are going to remove it. image

Min value of "x", "y", "z" are zero this indicates that there are faulty values in data that represents dimensionless or 2-dimensional diamonds. So we need to filter out those as it clearly faulty data points.

image

We lost 20 data points by deleting the dimensionless(2-D or 1-D) diamonds.

Checking for null values

image

we can see that the data is cleaned

The rest of this project you can see in the linked jupyter notebook file