

Picture having a buddy who just gets you and suggests the ideal game for your adventures
Explore the docs »
·
Check Project Blog on Medium »
·
View Demo »
Table of Contents
In this project, we developed a Games Recommender System using the LightFM library on the Steam dataset. The recommendation system employs a hybrid approach, combining collaborative filtering and content-based filtering techniques to provide personalized game suggestions for both multiplayer gaming sessions and solo adventures on the Steam platform. By considering user interactions, game metadata, and individual preferences, our model aims to enhance the gaming experience by offering nuanced and tailored recommendations.
For this project/post, we’ll be using the Steam dataset, which contains 7,793,069 reviews, 2,567,538 users, and 32,135 games. In addition to the review text, the data also includes the users’ play hours in each review.
We’ll focus on the interactions between users and items as well as on metadata about games, such as publisher, release date, genres and tags.
LightFM is a Python implementation of a hybrid recommendation algorithms for both implicit and explicit feedbacks.
It is a hybrid content-collaborative model which represents users and items as linear combinations of their content features’ latent factors. The model learns embeddings or latent representations of the users and items in such a way that it encodes user preferences over items. These representations produce scores for every item for a given user; items scored highly are more likely to be interesting to the user.
The user and item embeddings are estimated for every feature, and these features are then added together to be the final representations for users and items.
Let
The LightFM model operates based binary feedbacks, the ratings will be normalised into two groups. The user-item interaction pairs
For each user and item feature, their embeddings are
Similarly the biases for user
In LightFM, the representation for each user/item is a linear weighted sum of its feature vectors.
We trained two variants of model: Hybrid Filtering and Colaborative Filtering. Here the metrics calculated on train (blue) and test (orange) data:
From metrics you can see that Collaborative Filtering metrics is slightly but better than Hybrid Filtering. It could be confusing at first, but this problem is lay under the model implementation. At first we thought the model is broken, but then we understand that it’s quite straightforward: the model essentially calculates the average of the embeddings for all the features it receives. Due to this averaging process, the model lacks the ability to distinguish uninformative features and disregard them.
As a result, including numerous uninformative features can adversely impact your model by diminishing the significance of valuable features. To address this issue, you might need to consider employing more advanced models that LightFM does not provide implementations for.
Additionally, it’s worth noting that metadata features are more likely to enhance performance in situations with very sparse datasets or sparse subsets of your data, such as long-tail or cold-start scenarios.
Abdulayev Damir - telegram - d.abdulayev@innopolis.university
Dautov Almaz - telegram - a.dautov@innopolis.university
Project Link: https://github.com/thehir0/steam-recsys
Distributed under the MIT License. See LICENSE
for more information.