CSI4900

RumourEval: Determining rumour veracity and support for rumours

Tong Liu and Joseph Roque

Abstract

Taken from our CSI 4900 final report for the University of Ottawa

This paper outlines our attempt at creating a classifier for SemEval 2017, task 8: RumourEval. Social media has become a primary news source for many individuals, but it is becoming increasingly difficult to identify fake news stories. This task is divided into two subtasks, which ask for classifiers that can determine the context of numerous tweets in a thread, and then predict the veracity of a tweet as true, false, or unverified, at the time of posting. Our approach for subtask A uses a set of various numeric and boolean features to train two SVM classifiers, one for one-class learning, and one general classifier. Our approach for subtask B was using a similar set of features, with the addition of our results from subtask A, to train an SVM classifier. These approaches achieved a 78.1% accuracy and 50% accuracy, the second and third highest results for subtask A and B, respectively.

Results

Task A

Below you can find the final confusion matrix of our classifier for task A, which was meant to identify tweets which were comments about, denials of, supportive of, or querying of the source tweet. The labels on the left are true labels, and those on the bottom are what our classifier assigned.

These are the final results for task A in the competition, with our personal results added for comparison.

Task B

Below you can find the final confusion matrix of our classifier for task B, meant to identify tweets which were either false, true, or unverified. The labels on the left are true labels, and those on the bottom are what our classifier assigned.

These are the final results for task B in the competition, with our personal results added for comparison.

How to Run

Dependencies

Python3 (we recommend install with pyenv)
python-magic (see dependencies)
- For Mac, brew install libmagic

Setup

pip install -r requirements.txt to install Python library dependencies

Running the code

Arguments

python3 -m rumoureval [--verbose] [--test] [--osorted] [--disable-cache] [--plot] [--trump]

--verbose to get verbose output
--test to train on training data, then evaluate on test data. Without, model is tested on validation data
--osorted to output tweets sorted into their classes for task A and B
--disable-cache to force the task A classifier to retrain on training data. Used to speed up iterations on classifier in task B
--plot to plot the confusion matrices of task A and B
--trump to test classification of Trump tweets picked and labelled by ourselves

Contributing

Ensure all code passes pylint and pycodestyle tests, with the following invocations:

pylint rumoureval setup.py
pycodestyle --max-line-length=100 rumoureval setup.py

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
data-zip		data-zip
data		data
images		images
output		output
python2examples		python2examples
rumoureval		rumoureval
scorer		scorer
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
SOURCES.md		SOURCES.md
requirements.txt		requirements.txt
setup.py		setup.py
trump_output.txt		trump_output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSI4900

Abstract

Results

Task A

Task B

How to Run

Dependencies

Setup

Running the code

Recommended

Arguments

Contributing

About

Releases

Contributors 2

Languages

autoreleasefool/rumoureval

Folders and files

Latest commit

History

Repository files navigation

CSI4900

Abstract

Results

Task A

Task B

How to Run

Dependencies

Setup

Running the code

Recommended

Arguments

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Contributors 2

Languages