Welcome to the pitchProphet. This documentation will guide you through the setup, usage, and development of the project.
PitchProphet is a football match forecasting tool that predicts the outcome probabilities for upcoming fixtures in all major European leagues. You can explore the prediction results on the web application here.
- Data Scraping: Collects match-related data for teams and players from FBref.com across all major european leagues.
- Data Processing: Prepares the scraped data for training ML models.
- Predictive Modeling: Trains a machine learning model to predict win probabilities for upcoming matches.
- Web Application: Displays the prediction results in an easy-to-use web interface.
- Python 3.13+
- Poetry (for dependency management)
-
Clone the repository:
git clone git@github.com:6076paras/pitchProphet.git cd pitchProphet
-
Install dependencies:
poetry install
-
Activate the virtual environment:
poetry shell
To scrape data from FBref, run the following script:
get-data
The configurations for the scrapping can be set in the config.yaml
with "scrapper" key. For example,
...
scraper:
season: 2017-2018
league: Premier-League
player_data: false
...
To preprocess the scraped data, use the pre_process.py
script:
pre-process
The pre-processing step transforms raw match data into features suitable for model training. For each match and team, the system calculates descriptive statistics (aggregation, trend, variance) of team performance metrics (e.g., goals, xG, shots) from their previous N matches. This creates a rich set of features that capture both teams' recent form and performance variability. You can also spefify which of the features you want to pre-process from in the configuration file.
Configuration for feature extraction can be set in the config.yaml
under the "processing" key:
processing:
last_n_matches: 5
x_vars:
- Gls # goals scored
- xG # expected goals
- Sh # shots
- SoT # shots on target
# ... other features
The model development process is documented in Jupyter notebooks located in the pitchProphet/models/notebook/
directory:
model_classic_1.ipynb
: Initial model development using classical machine learning approaches- Data preparation and feature engineering
- Model training and evaluation
- Performance metrics and analysis with plots
- Model serialization (.pkl file)
To run the notebooks:
jupyter notebook pitchProphet/models/notebook/model_classic1.ipynb
The inference pipeline automatically processes upcoming fixtures and generates predictions that is fed to the web application. This involves several steps:
-
Matchweek Detection:
- The system tracks current matchweek dates for each league using a CSV file containing fixture schedules
- For each league, it checks if the current date falls outside the active matchweek period (between first and last game dates)
- Only processes leagues that aren't currently in an active matchweek
-
Feature Generation:
- For each upcoming fixture:
- Retrieves last N matches for both home and away teams
- Calculates descriptive statistics (aggregation, trend, variance) for each team's performance metrics
- Combines home and away team features into a format suitable for model inference
- For each upcoming fixture:
-
Model Prediction:
- Processes the prepared features through the trained XGBoost model
- Generates probability scores for three possible outcomes:
- Home Win
- Draw
- Away Win
This processed data feeds directly into the web application, which displays the predictions for upcoming fixtures in each league.
The web application provides an intuitive interface to view match predictions for all major European leagues. Built with Flask, it displays prediction probabilities for upcoming fixtures in each league. You can explore the prediction results on the web application here.
To start the web application locally:
python web/app.py
Then, open your browser and navigate to http://localhost:5000
.
To run the tests:
pytest
Test files are located in the tests/
directory and cover:
- Data scraping functionality
- Pre-processing pipeline
- Model inference
- Web application routes
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature-branch
) - Make your changes
- Install pre-commit hooks:
pre-commit install
- Commit your changes (
git commit -am 'Add new feature'
) - Push to the branch (
git push origin feature-branch
) - Create a new Pull Request