About Dataset

Context

This classic dataset contains the prices and other attributes of almost 54,000 diamonds.

Content

price price in US dollars ($326--$18,823)
carat weight of the diamond (0.2--5.01)
cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color diamond colour, from J (worst) to D (best)
clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x length in mm (0--10.74)
y width in mm (0--58.9)
z depth in mm (0--31.8)
depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
table width of top of diamond relative to widest point (43--95)

Importing Libraries

Loading Data

Data Preprocessing

Step involved in Data Preprocessing

Data Cleaning
Identifying and removing outliers
Encoding categorical variables

The first column is an index ("Unnamed: 0") and thus we are going to remove it.

Min value of "x", "y", "z" are zero this indicates that there are faulty values in data that represents dimensionless or 2-dimensional diamonds. So we need to filter out those as it clearly faulty data points.

We lost 20 data points by deleting the dimensionless(2-D or 1-D) diamonds.

Checking for null values

we can see that the data is cleaned

The rest of this project you can see in the linked jupyter notebook file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

About Dataset

Context

Content

Importing Libraries

Loading Data

Data Preprocessing

The rest of this project you can see in the linked jupyter notebook file

Files

README.md

Latest commit

History

README.md

File metadata and controls

About Dataset

Context

Content

Importing Libraries

Loading Data

Data Preprocessing

The rest of this project you can see in the linked jupyter notebook file