Skip to content

This repository contains assignments #2 that was completed as a part of "FIT5196 Data Wrangling", taught at Monash Uni in S2 2020.

Notifications You must be signed in to change notification settings

gaaniruddha/FIT5196-A2

Repository files navigation

Data Wrangling A2: Data Cleansing

  • Data_Cleansing_Specifications.pdf: Assignment specifications.
  • Data_Cleansing.ipynb/pdf: Python code to analyse the dataset and find and fix the problems in the data.
  • Input data: transactional retail data from an online electronics store.
  • 30945305_dirty_data.csv, 30945305_missing_data.csv, 30945305_outlier_data.csv: Input files (unclean data)
  • 30945305_dirty_data_solution.csv, 30945305_missing_data_solution.csv, 30945305_outlier_data_solution.csv: Input files (clean data)

Tasks completed:

  • Perform graphical and/or non-graphical EDA methods to understand the data first and then find and fix the data problems.
  • Detect and fix errors in 30945305_dirty_data.csv
  • Detect and remove outlier rows in 30945305_outlier_data_solution.csv (outliers are to be found w.r.t. delivery_charges attribute only)
  • Impute the missing values in 30945305_missing_data.csv

Libraries used: pandas, numpy, matplotlib, nltk, nltk.sentiment.vader, sklearn.linear_model, scipy

About

This repository contains assignments #2 that was completed as a part of "FIT5196 Data Wrangling", taught at Monash Uni in S2 2020.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published