Skip to content

tuanduong3001/Dinamond-Price-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

About Dataset

Context

This classic dataset contains the prices and other attributes of almost 54,000 diamonds.

Content

  • price price in US dollars ($326--$18,823)
  • carat weight of the diamond (0.2--5.01)
  • cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
  • color diamond colour, from J (worst) to D (best)
  • clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
  • x length in mm (0--10.74)
  • y width in mm (0--58.9)
  • z depth in mm (0--31.8)
  • depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
  • table width of top of diamond relative to widest point (43--95)

Importing Libraries

image

Loading Data

image image

Data Preprocessing

Step involved in Data Preprocessing

  • Data Cleaning
  • Identifying and removing outliers
  • Encoding categorical variables

image

The first column is an index ("Unnamed: 0") and thus we are going to remove it. image

Min value of "x", "y", "z" are zero this indicates that there are faulty values in data that represents dimensionless or 2-dimensional diamonds. So we need to filter out those as it clearly faulty data points.

image

We lost 20 data points by deleting the dimensionless(2-D or 1-D) diamonds.

Checking for null values

image

we can see that the data is cleaned

The rest of this project you can see in the linked jupyter notebook file

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published