Quora Insincere Questions Classification

This repository contains the code and two reports (Data Exploration and Text Classification) from participating in a Kaggle competition called Quora Insincere Questions Classification. This was part of a group coursework from COMP6208 Advanced Machine Learning class. We performed comprehensive exploratory data analysis and visualisation to obtain insights from data, and developed several machine learning models, ranging from logistic regression and SVM, to advanced models, such as LSTM with attention mechanism and BERT.

Problem Statement

Quora is a platform that offers a convenient way for people across the world to ask questions on any topics and receive answers from others. However, among the questions and comments, some contents are malicious and not genuine for discussion, such as topics about gender, faith, and violence. These questions are called insincere questions and need to be recognized and prohibited. To alleviate this issue, Quora has created a competition on Kaggle with the aim to encourage participants to create machine learning models to identify and flag insincere questions.

Libraries

Pandas
scikit-learn
PyTorch
NLTK
Matplotlib

Exploratory Data Analysis

Samples of visualisations from performing EDA.

Machine Learning Models

Logistic Regression with bag of words and TF-IDF
SVM with Naive Bayes features
XGBoost
TextCNN
Bidirectional LSTM with Attention Mechanism
Finetuning BERT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Images		Images
Data_Exploration.ipynb		Data_Exploration.ipynb
Data_Exploration_Report.pdf		Data_Exploration_Report.pdf
Machine_Learning_Report.pdf		Machine_Learning_Report.pdf
Quora_Question_Classification.ipynb		Quora_Question_Classification.ipynb
README.md		README.md
evaluate_quora_bert.py		evaluate_quora_bert.py
quora_bert.py		quora_bert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quora Insincere Questions Classification

Problem Statement

Libraries

Exploratory Data Analysis

Machine Learning Models

About

Releases

Packages

Languages

markvasin/Kaggle-Quora-Insincere-Questions-Classification

Folders and files

Latest commit

History

Repository files navigation

Quora Insincere Questions Classification

Problem Statement

Libraries

Exploratory Data Analysis

Machine Learning Models

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages