Music Recommender System Optimisation

Improve the user experience by analysing user listening history data to optimize a recommender system feature for music streaming service provider

Music streaming services allow people to listen to various types of music and millions of tracks with their intelligent devices based on their preferences; these advanced features have made listening to music much more accessible than ever (Adiyansjan, Gunawan, & Suhartono, 2019). With the increased competitors in the music streaming industry, having a good user experience is essential to increase competitive advantages and strengthen customer stickiness. In this project, we derived actionable suggestions by analyzing the user's listening history data and experience with the recommender system for improving a music recommender system by Deezer.

Author

Carol Hsu

Table of Content

Background

Deezer, iis a French music streaming service provider founded in 2006. It provides 73 million tracks and customized features based on subscription types. In addition, Deezer utilizes non-personalized recommendations based on common interests, which filter users' preferences and listening history. In 2016, Deezer introduced an exclusive feature - Flow - an optimized recommendation system based on the user's mood. According to the company, this new feature recommends new or have-listened tracks to users based on their listening history, context and time. In other words, users can listen to music depending on different moods, contexts or specific events.

Problem

Users nowadays are exposed to tons of information and face the paradox of choice, which means, having an abundance of choices could delay users in making decisions and deterring their motivation to stay with the services (Maasø & Hagen, 2020). Hence, for businesses who wish to increase competitive advantages and enhance user stickiness towards digital products, it is essential to develop a recommender system, a mechanism that automatically suggests media meeting user’s expectations (Hansen et al., 2021)

The challenge was given by Kaggle: To predict whether the users of the test dataset listened to the first track Flow proposed them or not. Deezer considers that a track is "listened" if the user has listened to more than 30 seconds of it (is_listened =1). If the user presses the skip button to change the song before 30 seconds, then the track is not considered as being listened (is_listened = 0).

Solution

The original goal of this kaggle challenge is to improve the recommender system that can accurately predict and suggest a track the user will listen more than 30 seconds. Nonetheless, having a positive User experience is crucial when measuring product performance, besides predicting whether a user would skip a song or not to create a recommender system, in this project, I conducted a user preference analysis to generate insights from user age, user activities, music preference and listening patterns into the user experience to optimize Deezer's recommendation system.

Data Source

The data is originated from a Kaggle challenge.

Description

The target variable of this dataset is is_listened. There are 7'558'834 obersvations with 14 preditors.

genre_id: ID of the genre of the song
media_id: ID of the song listened by the user
album_id: ID of the album of the song
media_duration: duration of the song
user_gender: gender of the user
user_id: user ID
context_type:type of content where the song was listened: playlist, album ...
release_date: release date of the song with the format YYYYMMDD
ts_listen: timestamp of the listening in UNIX time
platform_name: type of os
platform_family: type of device
user_age: age of the user
listen_type: if the songs was listened in a FLOW or not
artist_id: ID of the artist of the song
is_listened: 1 refers to a track that has been listened to, 0 otherwise

Methods

Data prepreocessing
Data Exploration
Feature Engnerring and Data Analysis

Preprocessing

After the data exploration, we’ve found three main issues in the train dataset:

There 17 entries of released_date are 30000101, which cannot be recognized with the time format
29,779 data entries where ts_listen is greater than released_date
There are 2 records where ts_listen is earlier than the time when Deezer was founded (in 2006)

Feature engineering

To better understand user preferences, behaviors and listening patterns, a series of feature engineering was conducted.

Time-related features: such as year, month, day, weekday, is_weekend, hour, minutes and seconds were derived from ts_listen, which indicates the time a user starts to listen to a track. After that, season and sessions were derived from month and hour, and ladled with four seasons and six different time sessions.
User-related features: user behaviour and listening patterns are created by aggregating user_id, ts_listen, user_age, media_duration and media_id.
listen_diff: User listen music duration
listen_percent: the percentage of a song is listened
time_gap: the gap before the next listen sesstion
listen_start: the time a user start to listen music
listen_end: the time a user stop to listen music

More detail can be seen in Deezer data analysis result.

Data Analysis

Feature FLOW

Fistly, we quickly have a look at the FLOW feature, which is the column listen_type. The listen_type indicates a user listen music use FLOW(listen_type = 1) or not (listen_type = 0). Attributes user_id, user_age, media_id (songs) were aggregated for calculating average number of songs listened per user and the percentage of songs listened across each user age group.

Table 1 gives information about the avarage lenght of songs people listen and percentage of song listening within and without FLOW function. It clearly shows that, user do not use flow function listened 3 times longer than user in the FLOW. More specficly, users who do not use flow function listened nearly 60% of a song, while, users who use flow function only listened less than 20% of a song which is recommmaded by the system.

Table 1. Average media listening percentage with and wihout FLOW function

User age is added to Table 2 to compare user listening behaviour accros ten age groups

Table 2. Media listening duration

listen_type is added to Table 3 to compare users listening behaviour accros ten age groups within and without FLOW function

Table 3. Media listening percentage with and wihout FLOW function based on Age group

User behaviour & perference analysis

Time Time is an essential factor which shifts users perderence from time to time. we divide 24 hours into six sessions including midnight, early morning, morning, afternoon, evening and night. The session graph below shows users started listening to music in the morning, reached the peak in the afternoon, and then dropped in the evening, and the hour graph gives information about user activity within with 24 hours.

Medium Features platform_family and platform_name referes to devices and operating system a user use to access to Deezer app, as the data was encoded with numeric value, we cannot tell what devices or opreation system users use, nonetheless, the platform_family 0 and platform_name 0 are the most prefereable mediums amongest users

Genre Analysis

When it comes to content analysis, genre is one of the features that can differ from time to time, as well as influenced by the surrounding scenarios of users. We found that there are 6 main genres, genre id 0, 7, 10 ,25, 27 and 14, were very popular among all other attributes, such as hour, session, context, platform, listen type and user_age. In other words, no matter the time, the user age or the context, these 6 genres would be favored by the users. Key findings are listed below and graphical analysis can be seen in deezer_eda_result

Key Findings

Feature Flow
1. The number of songs, the length of songs and song listened percentage increased gradually as the age rises.
2. Young users are more likely to skip songs than the 30-year-old age group.
3. Users with a 30 year-old age are more likely to finish songs recommended by the system.
4. Users aged 30 listened nearly two times more songs than users aged above 20.
5. Majority of users listening in the flow skipped more songs than users who were not in a flow, except users aged 19 and 30
User Behaviour
1. Number of active users dramatically increased between 5am to 6am.
2. The highest number of listeners showed up between 4 to 6pm, with figures above 500,000.
3. The number of users constantly decreased in the evening and dropped to 200,000 at 23 pm.
Gerne Preference
1. Genre_id 0 was the most popular genre among the top 10 ranking.
2. Genre_id 0, 7, 10 ,25, 27, 14, 734, 297, 2744 were the most popular.
3. Popular genres are beloved across most sessions. Except that genre_id 2744 was not popular during night and midnight, genre_id 50 was preferable during the Night, and genre_id 3645 in the midnight.
4. The Number of users listening to genre 0 was four times more without listening in the flow, whereas, there were more variety genres appearing when 6. users were listening in the flow.
5. Genre preference was different between user age groups. Among that, gerne_id 0 domainted genre preference across all user age groups, while user age 19 is the main audience of this genre.

Conclusion

To sum up, we found that time is one of the most critical elements that can affect the user when it comes to listening type of songs. Music preference also changed differently between user age groups, platform, and listen environments. To improve the new feature FLOW and reduce user bouncing rate, a context-based recommendation system is suggested, nevertheless, personalized features need to be considered when building such a model.

Project Reflection

It was a great experience to work on a dataset which contains millions of entries. Ideally, it would be good to start data processing in a database due to the simplicity of programming. In addition, performing data queries can help us to have a quick glance of data and have better understanding when performing some statistical calculations.

On the other hand, there are many categorical attributes which are replaced with numeric labels in the given dataset; it would be helpful to have the original labels of each categorical variable, which can help analysts form problem statements or hypotheses as well as provide better interpretation when analyzing data.

Notebook

Python Notebook

Reference

Adiyansjan, Gunawan, A. A., & Suhartono, D. (2019). Music Recommader Systen Based on Genre using COnvolutional Recurrent Neural Networls. Procedia Computer Science 157, 99-109.
Hansen, C., Mehrotra, R., Hansen, C., Brost, B., Maystre, L., & Lalmas, M. (2021). Shigting Consumption towards Diverse Content on Music Streaming Platforms. Proceedings of the 14th ACM International Conference on Web Search and Data MiningMarch, 238-246.
Maasø, A., & Hagen, A. N. (2020). Metrics and Decision-Making in music streaming. Popular communication Vol. 18, No. 1, 18-31.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
README.md		README.md
deezer_eda_result.ipynb		deezer_eda_result.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Recommender System Optimisation

Improve the user experience by analysing user listening history data to optimize a recommender system feature for music streaming service provider

Author

Table of Content

Background

Problem

Solution

Data Source

Description

Methods

Preprocessing

Feature engineering

Data Analysis

Feature FLOW

User behaviour & perference analysis

Genre Analysis

Key Findings

Conclusion

Project Reflection

Notebook

Reference

About

Releases

Packages

Languages

hsuwanying/music-streaming-analytic

Folders and files

Latest commit

History

Repository files navigation

Music Recommender System Optimisation

Improve the user experience by analysing user listening history data to optimize a recommender system feature for music streaming service provider

Author

Table of Content

Background

Problem

Solution

Data Source

Description

Methods

Preprocessing

Feature engineering

Data Analysis

Feature FLOW

User behaviour & perference analysis

Genre Analysis

Key Findings

Conclusion

Project Reflection

Notebook

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages