Skip to content

Latest commit

 

History

History
30 lines (18 loc) · 2.58 KB

README.md

File metadata and controls

30 lines (18 loc) · 2.58 KB

Advanced String-Processing with stringi

Summary

This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in October 2024. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2024.

Session contents

The session provides an overview of the key functions offered by the package stringi, which can be used for advanced string processing operations. While can be a bit more difficult to learn than stringr, of which it is an extension, it is much more versatile and therefore suited for handling more complex tasks. The focus of this workshop session is on crucial applications which are especially useful in the stage of preprocessing, e.g. string extraction, pattern matching, and text boundary analysis or working with text data from different locales. The primary learning objectives of the tutorial is as follows:

  • Motivate the importance of string-processing, particularly in data science in the context of political science
  • Introduce stringi, its key functions and potential use cases
  • Build a solid foundation in practical application of advanced string processing in a subject such as political science

The contents can be accessed here:

License

The material in this repository is made available under the MIT license.

Statement of Contributions

Benedict Anderer prepared part of the slides (Introducing stringi& Friends; Key Features & Use Cases; Further Resources) and passionately beatified the slide-deck with GIFS and images, made the recording and maintained the Github repository. Unfortunately he was sick at the time of the recording and asks you to excuse his strained voice.

Farhan Shaikh prepared part of the slides (Text as Data; String Processing in the Political Sciences; Match Made in Heaven: stringi& Regex) and contributed to the Github repository.

Saurav Jha prepared the practice material as well as the solutions.