Skip to content

clustering soccer player archetypes by using FIFA 2019 player stats data.

Notifications You must be signed in to change notification settings

emredogan7/soccer-player-archetype-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Soccer Player Archetype Clustering

A new approach in soccer scouting: Clustering soccer players by using FIFA 2019 player attributes dataset.

Motivation

  • Replacing an important player in a soccer team is a critical task.
  • Clubs fail to replace these players after they retire/leave.
News for the break-up of Andres Iniesta News for the break-up of Cristiano Ronaldo
  • Finding similar players having a similar gameplaying archetype might help clubs to replace these players more easily.

Dataset

  • FIFA 19 complete player dataset.

  • 88 different informative attributes for each player.

  • Within this study, position attributes (i.e. LS, ST, ST, RS) are not used. (might be a good future work.)

Approach

First, by using the attributes of each player, some new features are generated in the following way:

Generated Features Attributes Used
Pace Acceleration, Sprint Speed
Shooting Finishing, LongShots, Penalties, Positioning, ShotPower, Volleys
Passing Crossing, Curve, FKAccuracy, LongPassing, ShortPassing, Vision
Dribbling Agility, Balance, BallControl, Composure, Dribbling, Reactions
Defending HeadingAccuracy, Interceptions, Marking, StandingTackle, SlidingTackle
Physical Aggression, Jumping, Stamina, Strength

Each generated feature is concluded by averaging the corresponding attribute stats. Also, height and weight columns are used after being cleaned (removing punctuations and unit abbreviations).

The idea of averaging these attributes comes from FIFA game series itself. In FIFA player cards, 6 main player stats (pace, shooting, passing, dribbling, defending and physical) are generated from these attributes.

Figure Reference: here.

Then, by considering all these features, k-means clustering algorithm is used in order to maximize the intra-class similarity and minimize the interclass similarity between players.

How to decide the optimal number of clusters (k value) ?
Deciding on k value(number of clusters) is a critical step in clustering task. There are many different appraoches to decide k value, the most popular one: elbow method.

Elbow method is a heuristic to determine the optimal number of clusters, by selecting the value of k at the “elbow” ie the point after which the distortion/inertia start decreasing in a linear fashion. In order to see optimal k value, a range of values [2, 15] is used as k value to observe the distortion as a result of clustering task.

Elbow method tell us to choose k value as 3 or 4. However, it is obvious that there are many more soccer player archetypes. For this reason, I picked k value 7.

Results

After resulting with 7 different clusters, I analyzed each cluster and named it with respect to its characteristics. General info on each cluster is given below with 5 representative players:

Cluster 0 (classy creator):

  • These players are very important for the playmaking process.
  • They represent a transition between defence and attack.
  • Characteristics:
    • very good at passing.
  • Examples: Kevin De Bruyne, Luka Modric, etc.

Cluster 0, archetype: classy creator Figure: Top 5 players of the classy creator cluster

Cluster 1 (genuine defender):

  • This cluster consists of the solid defender players(most of the time centre-backs).
  • They are mostly not responsible for playmaking. Their genuine duty is to stop opponent attackers by tackling etc.
  • Characteristics:
    • very good defending and physical stats.
    • most of the time, tall players.
  • Examples: Diego Godin, Chiellini, Koulibaly, etc.

Cluster 1, archetype: genuine defender Figure: Top 5 players of the genuine defender cluster

Cluster 2 (zig-zag dribbler):

  • This kind of players are quite effective in dribbling inside the penalty area.
  • They are not famous for very fast sprints, but they are more comfortable with zig-zag dribblings.
  • Characteristics:
    • low physical power.
    • very good at dribbling and shooting.
  • Examples: Lionel Messi, Neymar, Eden Hazard, etc.

Cluster 2, archetype: zig-zag dribbler Figure: Top 5 players of the zig-zag dribbler cluster

Cluster 3 (solid scorer):

  • These players are considered as the main goal change for their teams.
  • They are not famous for very fast sprints, but they are more comfortable with zig-zag dribblings.
  • Characteristics:
    • very good shooting skills.
    • good physical stats.
  • Examples: C. Ronaldo, Luis Suarez, etc.

Cluster 3, archetype: solid scorer Figure: Top 5 players of the solid scorer cluster

Cluster 4 (cheetah scorer):

  • This kind of players are quite effective in the empty space.
  • They are famous for their very fast sprints.
  • Characteristics:
    • low physical power.
    • quite fast players.
    • very good at dribbling and shooting.
  • Examples: Leroy Sane, Anthony Martial, etc.

Cluster 4, archetype: cheetah scorer Figure: Top 5 players of the cheetah scorer cluster

Cluster 5 (playmaker defender):

  • This cluster consists of defenders (mostly centre-back and rarely defensive midfielder).
  • They are good at passing and shooting. This makes them special as this kind of players are responsible for playmaking (i.e. starting point of the attack organizations.)
  • Characteristics:
    • good defending skills (not surprising).
    • good at passing and shooting.
  • Examples: Sergio Ramos, Sergio Busquets, etc.

Cluster 5, archetype: playmaker defender Figure: Top 5 players of the playmaker defender cluster

Cluster 6 (fast defender):

  • Defenders with high velocity.
  • Not a very good seperated cluster. It is a combination of left/right back players and speedy defensive midfielder.
  • Characteristics: high speed, good physical stats.
  • Examples: N. Kante, Carvajal, etc.

Cluster 6, archetype: fast defender Figure: Top 5 players of the fast defender cluster

Average Statistics for Each Cluster

cluster characteristics Figure: Characteristics of All Archetypes

Conclusion

  • Young defenders appear in the clusters fast defender and genuine defender. As they get older, they tend to improve their playmaking skills (playmaker defender).

  • cheetah scorers are more likely the young wing-forwards while solid scorers are more experienced players.

  • zig-zag dribblers have very bad physical stats such as aggression, jumping, stamina, strength. Although it looks like a disadvantage, this characteristic probably lets them show their dribbling skills in a better way.

  • Players with the best physical stats are in the cluster genuine defender.

Future Work

  • There is still a lot to improve for this project. Some future ideas:
    • FIFA provides position skills for each player (i.e. LS, ST, ST, RS). Using them might give better results in terms of players finding players that can play in the same/similar positions.
    • FIFA provides a Potential attribute for players, as a measure of being a promising young player. Using this attribute might help to find young & cheap players in order to replace retiring/leaving players in the team.
    • Better feature engineering: It is a challenging task to achieve good features leading a good discrimination between data instances. Coming up with better features will definitely improve the clustering success.

Emre Dogan
January 20, 2020

About

clustering soccer player archetypes by using FIFA 2019 player stats data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published