Project Text 2: Sentiment Classification

This repository contains the code for the second project of EPFL's Machine Learning course.

Team Members

Adrien LAYDU
Manon MICHEL
François MICHEL

Goal

The goal of the project is to predict as correctly as possible on a large dataset if tweet messages used to contain a positive :) or negative :( smiley, by considering only their remaining text.

Package requirements

To run our code you need to install the following packages:

Numpy
```
pip install numpy
```
Scikit-Learn
```
pip install sklearn
```
NLTK (The Natural Language Processing Toolkit)
```
pip install nltk
```
Wordninja
```
pip install wordninja
```
Pandas
```
pip install pandas
```
PyTorch
```
pip install torch
```
TQDM
```
pip install tqdm
```

The Transformers repository by Huggingface

pip install git+https://github.com/huggingface/transformers.git

Regularized logistic regression using GD or SGD.
```
pip install numpy
```

Reproducing this Project

To run our Machine Learning algorithm with the best parameters we found:

Clone this project
Download the file twitter-datasets.zip on AIcrowd containing the datasets
Unzip it
Put it in a folder called /data inside the repository
Download all the packages required
Run the run.py file with the following command:
```
python3 run.py
```

Files description

Each file can be found in the folder scripts/:

run.py contains the code to get a prediction of the sentiments on the test dataset.
helper.py contains functions to import the datasets.
preprocess.py contains functions to preprocess tweets.
bert.py contains functions to train the BERT model and evaluate the accuracy of the model.
tweetToVec contains the function to convert the tweets into vectors using GloVe.
data_handler.py contains the function to split the datas into a training and a validation set.
data_processing.py contains functions to tokenize the tweets using the BERT technique.
bert_collab.ipynb is a notebook used to train the BERT model using Google Colab.
Simple_models.ipynb is a notebook containing all others the models (excluding BERT) we used to perform the sentiments analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
glove		glove
scripts		scripts
.gitignore		.gitignore
Pre-Processing.ipynb		Pre-Processing.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Text 2: Sentiment Classification

Team Members

Goal

Package requirements

Reproducing this Project

Files description

About

Releases

Packages

Contributors 2

Languages

flmichel/sentiment-classification

Folders and files

Latest commit

History

Repository files navigation

Project Text 2: Sentiment Classification

Team Members

Goal

Package requirements

Reproducing this Project

Files description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages