Given a dataset of tweets related to the 2019 Indonesian presidential election, our objective is to build a machine learning model that can accurately predict the sentiment of each tweet. The sentiment will be classified into categories such as positive, negative, and neutral. The primary metric for evaluating the model's performance will be accuracy. This metric is chosen for its straightforward interpretation and its ability to provide a clear measure of how well the model correctly classifies sentiments.
To evaluate and optimize our machine learning model, we will focus on the following metrics:
- Accuracy: As our primary evaluation metric, accuracy provides a clear measure of how well the model correctly classifies the sentiments of tweets.
- Precision and Recall: These metrics will help us understand the model's performance in terms of correctly identifying positive, negative, and neutral sentiments.
- F1 Score: This metric combines precision and recall to give a balanced measure of the model's performance, especially useful when dealing with imbalanced classes.
- Data Preparation: Consist of data extraction, data checking, and preprocessing. The preprocessing result then saved for later use.
- Data Visualization: Creating visualizations and analyses to better understand the tweet text data and identify any patterns or trends.
- Vectorizer: Transforming text data into numerical representations using vectorization techniques. The vectorizer model then saved for later use.
- Models: Model architecture definition, training using vectorized preprocessed data, and metrics evaluation.