Harry Potter books for Text Analysis.
This work was inspired by Bradley Boehmke's R package which claimed to have clean and tidy text data. Upon further inspection the text was in need of further text cleaning, including adding paragraphs to the end of a chapter and removing the many special characters.
This repository has kept each book in csv
files as to be most applicable to any user who wants to do text analysis and not deal with a .rda
file. Each csv
is a book and each book has 2 columns: chapter (in uppercase) and text.
The book order:
philosophers_stone
: Harry Potter and the Philosophers Stone, published in 1997chamber_of_secrets
: Harry Potter and the Chamber of Secrets, published in 1998prisoner_of_azkaban
: Harry Potter and the Prisoner of Azkaban, published in 1999goblet_of_fire
: Harry Potter and the Goblet of Fire, published in 2000order_of_the_phoenix
: Harry Potter and the Order of the Phoenix, published in 2003half_blood_prince
: Harry Potter and the Half-Blood Prince, published in 2005deathly_hallows
: Harry Potter and the Deathly Hallows, published in 2007