This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in October 2024. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2024.
This session will introduce you to the powerful tidymodels
framework in R
, which streamlines the modeling process in data science.
We'll begin by exploring recipes
, a package that simplifies the pre-processing and feature-engineering of data.
Next, we will dive into rsample
, which provides robust tools for data splitting and offers several distinct sampling strategies (we will discuss v-fold cross-validation, bootstrapping and rolling forecasting origin methods).
Moving on to modeling, we'll introduce parsnip
, which standardizes the interface for creating and tuning models across different algorithms.
Finally, we will focus on yardstick
, which provides a suite of functions for evaluating model performance.
The goals of this session are to (1) equip you with conceptual knowledge about the tidymodels
package and its usage in the whole regression modeling workflow, (2) show you the key sub-packages like rsample
, recipes
, parsnip
and yardstick
, and (3) provide you with practice material as well as some further readings.
The session is accompanied by a tutorial, which can be accessed here.
Important
If you are planning on attending the live-session of the workshop, please make sure to have the practice material ready before the session. As you will need our .RData
file, cloning this repository is the easiest way to do so. In the best case, you will also make sure to install the tidymodels
package before the beginning of the session.
- Website of the
tidymodels
package, including documentation, examples and guidestidymodels
consists of many more packages than we could cover in this session!
- Tidy Modelling with R
- Get started with tidymodels and classification of penguin data by Julia Silge
- tidymodels: Adventures in Rewriting a Modeling Pipeline - posit::conf(2023)
- A Gentle Introduction to tidymodels
- Tidymodels Ecosystem Tutorial
- Holistic Tidymodels Tutorial
Our code and practice materials are made available under the MIT license.
Franka Tetteroo prepared a brief overview of the whole tidymodels
package and introduced the rsample
package along with different sampling techniques.
Linus Hagemann prepared the recipes
and parsnip
packages' introduction. He also edited the recording and provided strong support with git-repository maintenance.
Sofiya Berdiyeva prepared the description of the yardstick
package.
All of the authors have also prepared the lab materials, including code example and exercises, respective to their sub-theme from the presentation.