Skip to content

intro-to-data-science-21-workshop/07-GulceTuncer_NassimZoueini_JanitorPackage

Repository files navigation

Cleaning data with janitor package

Summary

This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2021. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2021.

Session contents

This session will introduce you to the features offered by the janitor package to clean and manipulate raw data. Raw data is normally dirty, not well structured and presents a lot of inconsistencies with the dataframe conventional formats in Rstudio. The janitor package includes a set of functions that help reorganize unstructured data and manipulate it for interpretation.

Main learning objectives

The goals of this session are to (1) introduce you to the janitor package, (2) explore the functions that the package presents for cleaning dirty data, (3) provide you with tips on data manipulation and crosstabulation and (4) practice the mentioned competencies through real-life data and provide you with further resources (please make sure to download the attached excel data sheets or clone the whole repo).

Instructors

  • Gulce Tuncer
  • Nassim Zoueini

Further resources

License

The material in this repository is made available under the MIT license.

Statement of contributions

Gulce Tuncer prepared the presentation material and recording.

Nassim Zoueini prepared the practice material and will follow you step by step through the real-life example. And from behind the scenes, Nassim's sister helped with the post production :D

About

Cleaning data with the Janitor Package

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages