Skip to content

Welcome to the amazing world of quanteda. Text analysis, allocations, sentiment analysis and more. Welcome!

License

Notifications You must be signed in to change notification settings

intro-to-data-science-22-workshop/10-Text-analysis-with-quanteda-roa-fonseca-kraess

Repository files navigation

Text analysis with quanteda

Summary

This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2022. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2022.

Session contents

🎉 Welcome everyone to text analysis with quanteda 🎉

Have you ever wondered if analyzing what we read, write, say, and think is possible? Have you ever been curious to know if we can quantitatively measure and manage the speeches we hear, the movies we watch, or what we say to someone else? I have excellent news for you: it is possible and easier than you think. Quanteda is the answer to all your questions in this area.

This session will introduce you to the amazing quanteda world. Quanteda is the perfect package and a favorite for academia, researchers, and anyone who wants to do text mining and analysis with multiple text formats. Imagine, with this package; we can load .txt text files, word documents, and any other text format that can be read for further analysis. Quanteda is your best tool to generate word clouds, do statistics with words, and even be able to make inferences from your analyzed texts.

To teach you the main features of Quanteda and its potential, we will analyze a TV series; Yes, that's right: the television series How I Met Your Mother. We will discover principal characters; we will be able to know with whom they are related and even search for the most legendary phrases and the most useful words. Plus: we will do some sentiment analysis on the TV Show. Quanteda is fantastic, and our material will encourage you to enter the wonderful world of text analysis.

Our repository is divided as follows:

📁 data: Rdata frames to pull for our live coding session, presentations, etc.

📁 exercises: Markdowns with exercises.

  • 📄 Exercise quanteda: Exercise quanteda_cleaned.Rmd

  • 📄 Exercise quanteda with answers: Exercise quanteda.Rmd

📁 live-coding-session: files that contains a markdown with the live coding session.

  • 📄 Exercise quanteda with answers: How_we_met_quanteda.html

📁 quarto: our main presentation to introduce the topic in quarto presentation.

  • 📄 Quanteda_presentation_final_version.html (We will have two parts for the live coding session: a Markdown and this presentation).

📁 scripts: raw data file of our live code session for the persons that are interested.

  • 📄 01_raw_script.R

📁 texts: How I Met Your Mother TV scripts.

Main learning objectives

The goals of this session are to

1.- Introduce you to the world of quanteda.

2.- Show you the three main functions of quanteda for doing analysis: corpus, tokens, and dfm objects.

3.- Learn how to implement the most important analytics when having text.

4.- Handle the text and give you an idea of how you can present your own research.

5.- We are going to provide reference material, presentations and even exercises so that you understand what the text analysis process is like. We want you to have fun, but also learn with clear and fun examples.

Instructors

🥋 Jorge Roa GitHub | LinkedIn | Twitter

🥋 Augusto Fonseca GitHub | Email

🥋 Alexander Kraess Email | Twitter

Further

🎯 Quanteda Webpage

🎯 A Beginner’s Guide to Text Analysis with quanteda (University of Virginia)

🎯 Amazing document created by Kenneth Benoi (University of Münster)

🎯 An Introduction to Text as Data with quanteda (Penn State and Essex courses)

🎯 Text as Data: quantitative text analysis with R. Data Science Summer School 2022. Hertie School

🎯 Quanteda Cheat Sheet

🎯 Advancing Text Mining with R and quanteda: Methods Bites

🎯 Advancing Text Mining: Cornelius Puschmann

🎯 Text as data: Avatar Kenneth Benoit. Director, LSE Data Science Institute

🎯 Analysis of financial texts using R: Kohei Watanabe

🎯 Using quanteda to analyze social media text: Pablo Barbera

🎯 Quanteda initiative

🎯 The 5 Packages You Should Know for Text Analysis with R

License

The material in this repository is made available under the MIT license.

Statement of contributions

Jorge Roa prepared the code, programed the presentation and prepared the markdown of the code.

Augusto Fonseca Checked in detail bugs in the code; suggested new piece of codes; prepared the practice material for the repo; he prepocessed the video and edit it.

Alexander Kraess Checked in detail bugs in the code; suggested new piece of codes; prepared the practice material for the repo; he coordinate our meetings and logistics of the team

About

Welcome to the amazing world of quanteda. Text analysis, allocations, sentiment analysis and more. Welcome!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •