Tweet Scraper

Using the twitterscraper tool to scrape the latest 500 tweets of U.S. Senators and other prominent politicans. Their tweets will serve as labelled partisan speech to train a supervised NLP classification model, which will eventually be able to classify the tweets of any twitter user (or any text block) given their handle as input.

Requires the twitterscraper and fasttext plugins. Run pip install -r requirements.txt to download them.

Run python scraper.py to generate the commands necessary to scrape twitter. Unfortunately, we can't run each command automatically, you will have to scrape each user's twitter individually by running the commands one at a time from dCommands.txt and rCommands.txt. Data will be saved within data/d/ and data/r/ - the full dataset will be posted in a .zip file once I have finished scraping myself.

How I trained the model -> this entails downloading fasttext's github repo within the data/ folder. Follow the instructions in their readme to install the necessary packages. Run textClean.py to create our supervised learning training and validation sets. Then, go to data/fasttext/ and run ./fasttext supervised -input ../fullEdited/supervised.train -output ../../model/model -epoch 60. You can fiddle around with the epoch number but pretty much anything above 50 will get us reasonably high accuracy. To get test accuracy, run ./fasttext test ../../model/model.bin ../fullyEdited/supervised.valid. You should get a number just north of 99%.

Run python predict.py TWITTERHANDLE in the main directory to predict the political leaning of any public user on twitter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Scraper

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
model		model
.gitignore		.gitignore
README.md		README.md
predict.py		predict.py
requirements.txt		requirements.txt
scraper.py		scraper.py
textClean.py		textClean.py

benbroks/TweetScraper

Folders and files

Latest commit

History

Repository files navigation

Tweet Scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages