Text-Classification---R

Classifying documents into categories

The file "Topic Modelling" is based on the blog post Topic Modeling in R.

BEfore running on Windows I had some setup to do:

Install R from https://cran.r-project.org/bin/windows/base/
Install RStudio from https://www.rstudio.com/products/rstudio/download/
Create a Twitter App to collect the data at https://apps.twitter.com/app/new - more details Twitter Analytics Using R Part 1: Extract Tweets

the values from the Twitter app need to go into these lines in "Topic Modelling":

Consumer_key<- "YOUR_CONSUMER_KEY"
Consumer_secret <- "YOUR_CONSUMER_SECRET"
access_token <- "YOUR_ACCESS_TOKEN"
access_token_secret <- "YOUR_TOKEN_SECRET"

Then in the console make sure all the packages required are installed (this is in the script "prereqs.R"):

install.packages('twitteR')
install.packages("tm")
install.packages("wordcloud")
install.packages("slam")
install.packages("topicmodels")
install.packages('base64enc')

Now you are ready to run "Topic Modelling" in RStudio! There are two lines near the head of the file that can be used to tweak the number and type of of results:

numTweets <- 900

tweetData <- searchTwitter("flight", n=numTweets, lang="en")

numTweets is used in the twitter search and later in to set the SEED for the analysis algorithm.

tweetData holds the data that we want to analyse. Here I have done a simple search for the text "flight" in English. See Getting Data via twitteR for more ideas. Look in "topic-exploration.R" and "Sentiment.R" for more ways to look at the data downloaded in "Topic Modelling.R"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Text-Classification---R

Files

README.md

Latest commit

History

README.md

File metadata and controls

Text-Classification---R