- This repository contains the code and resources for the "Natural Language Processing with Disaster Tweets" competition. The goal of the competition is to analyze and process tweets related to disasters using natural language processing techniques.
- Our submission achieved a commendable rank of 45th on the competition leaderboard.
- In this competition, participants are provided with a dataset of tweets that are labeled as either related to a disaster or not. The task is to build a model that can accurately classify new tweets as disaster-related or not. The competition encourages the use of various natural language processing (NLP) techniques to extract meaningful features from text data and train a predictive model.
- datasets/: This directory contains the dataset for the competition.
- sources/: This directory contains Jupyter notebooks with exploratory data analysis, feature engineering, and model development
data_collecting.ipynb
: Explain the dataset.data_exploring.ipynb
: Exploring data using visualization techniques.natural-language-processing-with-disaster-tweets.ipynb
: Performs exploratory data analysis (EDA) on the data and data preprocessing, then split data into train/val/test.natural-language-processing-with-disaster-tweet-v2.ipynb
: Builds the model using the processed data fromnatural-language-processing-with-disaster-tweets.ipynb
.data_modeling (2).ipynb
: Another model by my teammate that has the same result.
- Report_Final.pdf: Report of this project.
- linkyoutube.txt: The presentation of my team.
To get started with this project, follow these steps:
- Clone this repository.
- Explore the notebooks in the sources/ directory to understand the data and various NLP techniques used.
- Evaluate the model's performance and make necessary improvements.