COVID Misinformation Text Classification Modelling

With Joy Zhang (Columbia University)

The outbreak of COVID-19 was accompanied by fake news and misleading posts on social media platforms around the world. That misinformation hinders epidemic prevention and control, foments xenophobia, and exacerbates the damage caused by the virus itself. To address this issue, we first conduct some exploratory analysis on the difference between fake news and real news. We then train eight machine learning models using various algorithms including Logistic Regression, Naive Bayes, Random Forest, SVM, AdaBoost, and a simple deep learning approach. We obtained the highest 94.1% accuracy score and 93.9% F1 score using the Logistic Regression model with count vectorization as the feature extraction method.

The second part of the repo contains code used for the winning entry to the 2021 AI Modelshare Competition where I used a Tensorflow-BERT model to predict fake COVID tweets with 97.3% accuracy and 97.3% F1 score.

accuracy	f1 score	precision	recall	framework	optimizer	loss rate
97.4%	97.4%	97.4%	97.4%	keras-BERT	Adam	1e-5

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
AI_Modelshare_Competition_Code.ipynb		AI_Modelshare_Competition_Code.ipynb
COVID_Misinfo_NLP.ipynb		COVID_Misinfo_NLP.ipynb
Constraint_Full.csv		Constraint_Full.csv
Model_Performance_Final.csv		Model_Performance_Final.csv
README.md		README.md
bert_weights.index		bert_weights.index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID Misinformation Text Classification Modelling

About

Releases

Packages

Languages

ltk2118/covid_misinformation

Folders and files

Latest commit

History

Repository files navigation

COVID Misinformation Text Classification Modelling

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages