- Course: Flatiron Data Science
- Pace: Self Paced
- Instructor: Jeff Herman
- Author: Cody D. Freese
Predict a Video Games ESRB rating & find a model that provides the highest level of confidence in classifiying and distinguishing E rating from M rating.
Accompanying CSV's provided by Kaggle
- The data was packaged into separate train and test sets. Ratings ranged from E for Everyone, E10+/ET for Everyone 10 and up, T for Teen and M for Mature. Several factors, variables and scales of the factors weigh on a games ability to be scored and rated. With categories like Blood having several variations as Blood, Blood and Gore, Mild Blood and Animated Blood.
I used Python in Jupyter Notebook to perform Decision Tree , KNN & Random Forest models to predict a video games rating and it's ability to distinguish between ratings.
I obtained the ESRB Rating dataset from Kaggle. If you want to get started on your own, for this repo.
After importing all the data I checked it for null values, duplicates and any datatype errors that may present a problem in Exploratory Data Analysis or Machine Learning Modeling
To get an idea of what the data was comprised of I represented the data with how many games were exclusive to each console and which were shared between both
With the variety available in classifying a games rating I wanted to see how many ratings I was working with with a visual representation
Along the bottom row we can see how the esrb rating is coorelated to the variables; some having a positive and negative influence. A curious note we'll come back to is the strong coorelation between the rating and a variable called No_Descriptors
While running different types of models I decided per the metrics I was targeting that my final model would be a Random Forest model. F1 Score was my most important metric as overall accuracy and ability for the model to rate games in the correct category and not categorize games in the wrong category.
A table better describing the above model. F1 Score was determined to be my most important metric, as I wanted both the accuracy in ability to correctly predict a games rating, but also a high recall so as to not accidently rate a game E when it really should be rated M.
Precision | Recall | F1-Score | Support | |
---|---|---|---|---|
0 = E | 93% | 99% | 96% | 138 |
1 = ET+ | 74% | 85% | 79% | 143 |
2 = T | 84% | 78% | 80% | 268 |
3 = M | 89% | 82% | 85% | 163 |
Weighted Average | 85% | 86% | 85% | 712 |
Accuracy: 84.41% |
The model is effective at distinguishing between the extremes in the rating system, and shows subsequent robustness in its ability to to correctly rate games rated T. Some bleedthough and mislabeling occurs mostly between the steps of classes E and ET, ET and T. A few rare instances of E being rated T. Although in the past some games actual ratings have been called into question. Legend of Zelda: Twilight Princess and Super Smash Bros. Melee & Brawl both being rated T for teen when I myself can remember playing games would have been classified the same at the age of 8. Cartoon and mischief humor that blurs the lines between the lower classes of rating. See supplemental article below about other games that have been mislabeled for better or for worse.
Exploring the variables further, upon investigating the mysterious 'No_Descriptors' and its strong influence on E rated games.
Article from TheGamer about mislabeled games. https://www.thegamer.com/classic-games-esrb-rating-wrong/
See the full analysis in the Jupyter Notebook or review the presentation. For additional info, contact me here: Cody D. Freese
RetroPete. “Pile of Video Games.” 8-Bit Central, RetroPete, 21 May 2013, www.8-bitcentral.com/blog/2013/gameManuals.html.
Sharp, Nathan. “10 Classic Games That Probably Have The Wrong ESRB Rating.” TheGamer, 15 June 2019, www.thegamer.com/classic-games-esrb-rating-wrong/.