The European Soccer Database contains data on more than 25.000 national football matches from the best European leagues. The aim of this exercise is to present interesting relationships in R using explorative data analysis and visualization.
First you need to access some tables in the database. Note: You can use the RSQLite::dbConnect() function to do this. To access a particular
database table and convert it to a data.frame
, you can use the tbl_df(dbGetQuery(connection, 'SELECT * FROM table_xyz'))
command.
The first leagues of Spain, England, Germany and Italy are considered the four most attractive football leagues in Europe.
-
In which of the four leagues do on average score the most or the fewest goals per game?
-
Compare the average, median, standard deviation, variance, range and interquartile distance of goals scored per match between the four most attractive European leagues and the remaining leagues.
-
Is there really a home advantage? Use a box plot to show the number of goals scored by home and away teams.
-
“All soccer players are fair-weather players!” Check the assertion with a line chart: Do on average more goals fall per game in the summer months than in the rest of the year?
- Display the average goals scored per game for the top 4 leagues per year from 2008 to 2016.
- Use an estimated density function curve AND a QQ-Plots to check whether the home_team_possession variable is (approximately) normally distributed.