The European Soccer Database contains data on more than 25.000 national football matches from the best European leagues. The aim of this exercise is to present interesting relationships in R using explorative data analysis and visualization.
First you need to access some tables in the database. Note: You can use the RSQLite::dbConnect() function to do this. To access a particular
database table and convert it to a data.frame
, you can use the tbl_df(dbGetQuery(connection, 'SELECT * FROM table_xyz'))
The first leagues of Spain, England, Germany and Italy are considered the four most attractive football leagues in Europe.
In which of the four leagues do on average score the most or the fewest goals per game?
Compare the average, median, standard deviation, variance, range and interquartile distance of goals scored per match between the four most attractive European leagues and the remaining leagues.
Is there really a home advantage? Use a box plot to show the number of goals scored by home and away teams.
“All soccer players are fair-weather players!” Check the assertion with a line chart: Do on average more goals fall per game in the summer months than in the rest of the year?
- Display the average goals scored per game for the top 4 leagues per year from 2008 to 2016.
- Use an estimated density function curve AND a QQ-Plots to check whether the home_team_possession variable is (approximately) normally distributed.