Skip to content

Latest commit

 

History

History
28 lines (24 loc) · 1.45 KB

L04_Cleaning_Data.md

File metadata and controls

28 lines (24 loc) · 1.45 KB

Cleaning Data

Introduction

Now that you know how to gather and assess data you're going to master cleaning it. Cleaning your data is the third and final step in the data wrangling process. This is where the quality and tidiness issues you identified in the Assess step are remedied. It can be done manually in spreadsheet programs or text editors, but data cleaning is often best done using code and in three steps. First you define how you're going to clean issue in words, then you convert these words to code, and finally test your data to make sure if that code worked. Watching your data transform to clean feels like magic sometimes. In this lesson you'll take your assessments from the last lesson and define, code and test cleaning operations for each. You'll also become intimately familiar with the cleaning and testing functions in Python and the pandas library. At the end the lesson we'll flash forward to analysis and visualization and you'll see how cleaning was absolutely necessary. You're going to leave with the tools required to clean pretty much everything you come across in the future. And since it's the last lesson of the course, you're also going to leave an expert data wrangler.

Lesson Outline

Cleaning your data is the third step in data wrangling. It is where you fix the quality and tidiness issues that you identified in the assess step. In this lesson, you'll clean all of the issues you identified in Lesson 3 using Python and pandas.