Flatiron Module 2 Linear regression Project
Working with the FIFA 19 video game data, I'll be analyzing, the different aspects of the players game atributes to build a linear regression model that would predicit market value of the players in the top 5 leagues in Europe. The English, German, French, Italian and Spanish leagues.
-
my process had 6 parts:
-
Getting my Data
-
Cleaning and analyzing the data using Pandas
-
Running Statistical tests on my data using scipy
-
Visualizing our insights using Seaborn and Mathplotlib
-
Engineering Features for my model
-
Fiting my model
From Kaggle, I got the fifa 19 game data, complete with all the different attributes for each player.Each row contained each player in the game and the columns contain different information about them.
I spent sevral hours cleaning the data using Pandas Dataframes and Series to prepare it for analysis. I removed or updated Null values. I group relevating information together and created new columns When necessary. I was able to select a data frame with just the players belonging to the top five leagues in Europe.I also changed the datatypes of many columns so I could make calculations and comparisions acorss columns.
I ran sevral statistical test on my data using sciypy. For example, I ran an ANOVA test to see if the average age was the same accross the different leagues.
I explored my data and provided various visualiztions for the statistical tests I carried out and also to gain more insigt on the factors I considered to have and effect on how much a player is valued.
I created new features from the information in my data. I was able to uptain knowlege from the EDA on things that affect the value of my data. I checked for coorelation and dropped features that were coorelated to the value which is my target varribale. I transformed non linear relationships with my target varriable to better capture a linear relationship so my model would fit accordingly.
After all the feature selection, I split my data so I could have a test sample after I fit my model to check its acurracy. I carried severally fits with ols from statsmodel, Lasso and the ordinary linear regression model form sklearn. I went on the sepreate my data set to outfield players and Goalkeepers and fit two different models to test for better accuracy.
I was able to fit my model to R square greater that 0.90 and root mean square error within 0.4 of the standard devation of the test sample, which was a very good model.
FIFA does alot of work to mold a players real world physical and mental attribute into the game. Moving forward my hope is that I can use this insight I got about different Important Atributes FIFA uses to determine Player values to Apply it to real world player values. I believe if one is able to capture cetain attributes and informtion about a player in the real world, in conjuction with real world Market pointers, One can predict actual player Market Value
https://github.com/chibz3/fifa-project/blob/master/fifa-project_presentation%20.pdf