-
ALBERT-V2 Steam Game-Review Constructiveness Classification Model
-
1.5k Steam Reviews with Binary Constructiveness Labels Dataset
-
Jupyter Notebooks for: Data Filtering, Data Preprocessing, Training, Inference, Evaluation
The model contained in this repository is a fine-tuned version of albert-base-v2, designed to classify whether Steam game reviews are constructive or non-constructive. It was trained on the 1.5K Steam Reviews Binary Labeled for Constructiveness dataset, also contained in this repository.
The model can be applied in any scenario where it's important to distinguish between helpful and unhelpful textual feedback, particularly in the context of gaming communities or online reviews. Potential use cases are platforms like Steam, Discord, or any community-driven feedback systems where understanding the quality of feedback is critical.
- Domain Specificity: The model was trained on Steam reviews and may not generalize well outside gaming.
- Dataset Imbalance: The training data has an approximate 63.04%-36.96% split between non-constructive and constructive reviews.
The dataset contained in this repository consists of 1,461 Steam reviews from 10 of the most reviewed games in the base 100 Million+ Steam Reviews dataset. Each game has approximately the same number of reviews. Each review is annotated with a binary label indicating whether the review is constructive or not.
Also available as additional data are train/dev/test split CSV files. These contain the features of the main dataset, concatenated into strings, next to the binary constructiveness labels. These CSVs were used to train the model.
The dataset is designed to support tasks related to text classification, particularly constructiveness detection in the gaming domain. It is particularly useful for training models like BERT and its derivatives or any other NLP models aimed at classifying text for constructiveness.
The dataset contains the following columns:
- id: A unique identifier for each review.
- game: The name of the game being reviewed.
- review: The text of the Steam review.
- author_playtime_at_review: The number of hours the author had played the game at the time of writing the review.
- voted_up: Whether the user marked the review/the game as positive (True) or negative (False).
- votes_up: The number of upvotes the review received from other users.
- votes_funny: The number of "funny" votes the review received from other users.
- constructive: A binary label indicating whether the review was constructive (1) or not (0).
id | game | review | author_playtime_at_review | voted_up | votes_up | votes_funny | constructive |
---|---|---|---|---|---|---|---|
1024 | Team Fortress 2 | shoot enemy | 639 | True | 1 | 0 | 0 |
652 | Grand Theft Auto V | 6 damn years and it's still rocking like its g... | 145 | True | 0 | 0 | 0 |
1244 | Terraria | Great game highly recommend for people who like... | 569 | True | 0 | 0 | 1 |
15 | Among Us | So good. Amazing game of teamwork and betrayal... | 5 | True | 0 | 0 | 1 |
584 | Garry's Mod | Jbmod is trash!!! | 65 | True | 0 | 0 | 0 |
- Constructive (1): Reviews that provide helpful feedback, suggestions for improvement, constructive criticism, or detailed insights into the game.
- Non-constructive (0): Reviews that do not offer useful feedback, lack substance, are vague, off-topic, irrelevant, or trolling.
Please note that the dataset is unbalanced: 63.04% of the reviews were labeled as non-constructive, while 36.96% were labeled as constructive. Please take this into account when utilizing the dataset.
The dataset features were combined into a single string per review, formatted as follows:
Review: {review}, Playtime: {author_playtime_at_review}, Voted Up: {voted_up}, Upvotes: {votes_up}, Votes Funny: {votes_funny}"
and then fed to the model accompanied by the respective constructive labels.
This approach of concatenating the features into a simple string offers a good trade-off between complexity and performance.
Originally, the model was created as part of the process of evaluating several different models against eachother. These models were: BERT, DistilBERT, ALBERT-V2, XLNet, GPT-2
The repository contains the following five jupyter notebooks:
- Filtering: Filtering and reduction of the base dataset to a smaller more usable one.
- Preprocessing: Basic and conservative preprocssing, fit for transformer-based LLM fine-tuning. Simple statistical analysis of the dataset and annotations.
- Training: Fine-Tuning / Training of the model.
- Inference: Simple testing environment.
- Evaluation: Evaluation environment to evaluate the aformentioned classification models against eachother.
Note: Please take into account that the jupyter notebooks are a mix of working with Google Colab computing resources and local resources. Therefore, in order to use them, they need to be modified to match your own personal working environment.
The model was trained and evaluated using a 80/10/10 Train/Dev/Test split, achieving the following performance metrics on the test set:
- Accuracy: 0.80
- Precision: 0.80
- Recall: 0.82
- F1-score: 0.79
These results indicate that the model performs reasonably well at identifying the correct label (~80%).
Explore and test the model interactively on its Hugging Face Space.
To use the model programmatically, use this Python snippet:
from transformers import pipeline
import torch
device = 0 if torch.cuda.is_available() else -1
torch_d_type = torch.float16 if torch.cuda.is_available() else torch.float32
base_model_name = "albert-base-v2"
finetuned_model_name = "abullard1/albert-v2-steam-review-constructiveness-classifier"
classifier = pipeline(
task="text-classification",
model=finetuned_model_name,
tokenizer=base_model_name,
device=device,
top_k=None,
truncation=True,
max_length=512,
torch_dtype=torch_d_type
)
review = "Review: I think this is a great game but it still has some room for improvement., Playtime: 12, Voted Up: True, Upvotes: 1, Votes Funny: 0"
result = classifier(review)
print(result)
Alterantively the Jupyter Notebooks contained in this repository can be used to test the model or even replicate the process of training/fine-tuning.
The notebooks contain useful code comments throughout, describing what is happening at every step of the way.
An interesting case for further modifications or improvements, would be to augment or modify the training dataset. Feel free to do so.
The model, dataset and code in this repository is licensed under the MIT License, allowing open and flexible use of the dataset for both academic and commercial purposes.