This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2021. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2021.
This session will introduce you to the features offered by the janitor package to clean and manipulate raw data. Raw data is normally dirty, not well structured and presents a lot of inconsistencies with the dataframe conventional formats in Rstudio. The janitor package includes a set of functions that help reorganize unstructured data and manipulate it for interpretation.
The goals of this session are to (1) introduce you to the janitor package, (2) explore the functions that the package presents for cleaning dirty data, (3) provide you with tips on data manipulation and crosstabulation and (4) practice the mentioned competencies through real-life data and provide you with further resources (please make sure to download the attached excel data sheets or clone the whole repo).
- Gulce Tuncer
- Nassim Zoueini
- [Package 'janitor'] (https://cran.r-project.org/web/packages/janitor/janitor.pdf)
- [Overview of janitor functions] (https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html)
- [Catalog of janitor functions] (https://garthtarr.github.io/meatR/janitor.html#catalog_of_janitor_functions)
The material in this repository is made available under the MIT license.
Gulce Tuncer prepared the presentation material and recording.
Nassim Zoueini prepared the practice material and will follow you step by step through the real-life example. And from behind the scenes, Nassim's sister helped with the post production :D