#FPL Analysis
An analysis of data which has been extracted from Fantasy Premier League, using R and Python.
The results of the project can be viewed here.
The first step is to setup a database:
- Create an empty database named 'fpl'.
- Run sql/fpl_s016_12_20.sql inside the 'fpl' database. This will generate the schema for you such that the data can later be saved.
To extract the Fantasy Premier League data:
git clone https://github.com/adhorrig/fpl.git
to download the tools for extraction.cd fpl
to move into the directory.npm install
to download the project dependencies from the node package manager.- Go into the files ‘live.js’, ‘players.js’, ‘profiles.js’, ‘teams.js’ and ‘gameweeks.js’. You will need to change the database connection details to match your own.
- Run each Node file using
node filename.js
. Note: Profiles.js will try to extract data for 4 million profiles. Reduce this number in the for loop or the run time will be 20+ days. - The data from the Fantasy Football application will now have been extracted and saved to your database.
The second dataset comes from Met Eireann.
- It can be downloaded from: http://www.met.ie/climate-request/
- The weather data is in a CSV file and will need to be inserted to the database created in the beginning (specifically, the weather table).
- Many MySQL clients take care of this for you, just right click on the table name and click import from CSV.
- If no client is being used – refer to: http://stackoverflow.com/questions/6605765/importing-a-csv-into-mysql-via-command-line
Once both datasets have been imported to the database, the analysis can be carried out.
R.
- Open analysis.r
- Install the libraries as noted at the top of the file.
- On line eight, database connection details will need to be changed to match your own connection.
- After this, the code can be ran line by line to generate graphs which will visualize the data from inside our database.
Python.
- Export the profiles table to CSV from the database.
- Execute
python map.py profiles.csv | python reduce.py > results.txt
- The results of the MapReduce job will show the total number of points gathered for each regional user profile. It takes a few minutes to complete and the results file will be 2.84 GB.