GitHub - 6076paras/pitchProphet: Using ML to predict football match outcome

Welcome to the pitchProphet. This documentation will guide you through the setup, usage, and development of the project.

Introduction

PitchProphet is a football match forecasting tool that predicts the outcome probabilities for upcoming fixtures in all major European leagues. You can explore the prediction results on the web application here.

Current Features

Data Scraping: Collects match-related data for teams and players from FBref.com across all major european leagues.
Data Processing: Prepares the scraped data for training ML models.
Predictive Modeling: Trains a machine learning model to predict win probabilities for upcoming matches.
Web Application: Displays the prediction results in an easy-to-use web interface.

Installation

Prerequisites

Python 3.13+
Poetry (for dependency management)

Steps

Clone the repository:

git clone git@github.com:6076paras/pitchProphet.git
cd pitchProphet

Install dependencies:
```
poetry install
```
Activate the virtual environment:
```
poetry shell
```

Usage

Data Scraping

To scrape data from FBref, run the following script:

get-data

The configurations for the scrapping can be set in the config.yaml with "scrapper" key. For example,

...
scraper:
  season: 2017-2018
  league: Premier-League
  player_data: false
...

Pre-processing

To preprocess the scraped data, use the pre_process.py script:

pre-process

The pre-processing step transforms raw match data into features suitable for model training. For each match and team, the system calculates descriptive statistics (aggregation, trend, variance) of team performance metrics (e.g., goals, xG, shots) from their previous N matches. This creates a rich set of features that capture both teams' recent form and performance variability. You can also spefify which of the features you want to pre-process from in the configuration file.

Configuration for feature extraction can be set in the config.yaml under the "processing" key:

processing:
  last_n_matches: 5
  x_vars:
    - Gls     # goals scored
    - xG      # expected goals
    - Sh      # shots
    - SoT     # shots on target
    # ... other features

Model Development

The model development process is documented in Jupyter notebooks located in the pitchProphet/models/notebook/ directory:

model_classic_1.ipynb: Initial model development using classical machine learning approaches
- Data preparation and feature engineering
- Model training and evaluation
- Performance metrics and analysis with plots
- Model serialization (.pkl file)

To run the notebooks:

jupyter notebook pitchProphet/models/notebook/model_classic1.ipynb

Inference Pipeline

The inference pipeline automatically processes upcoming fixtures and generates predictions that is fed to the web application. This involves several steps:

Matchweek Detection:
- The system tracks current matchweek dates for each league using a CSV file containing fixture schedules
- For each league, it checks if the current date falls outside the active matchweek period (between first and last game dates)
- Only processes leagues that aren't currently in an active matchweek
Feature Generation:
- For each upcoming fixture:
  - Retrieves last N matches for both home and away teams
  - Calculates descriptive statistics (aggregation, trend, variance) for each team's performance metrics
  - Combines home and away team features into a format suitable for model inference
Model Prediction:
- Processes the prepared features through the trained XGBoost model
- Generates probability scores for three possible outcomes:
  - Home Win
  - Draw
  - Away Win

This processed data feeds directly into the web application, which displays the predictions for upcoming fixtures in each league.

Web Application

The web application provides an intuitive interface to view match predictions for all major European leagues. Built with Flask, it displays prediction probabilities for upcoming fixtures in each league. You can explore the prediction results on the web application here.

Local Development

To start the web application locally:

python web/app.py

Then, open your browser and navigate to http://localhost:5000.

Deployment

Testing

To run the tests:

pytest

Test files are located in the tests/ directory and cover:

Data scraping functionality
Pre-processing pipeline
Model inference
Web application routes

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a new branch (git checkout -b feature-branch)
Make your changes
Install pre-commit hooks:
```
pre-commit install
```
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature-branch)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
pitchProphet		pitchProphet
tests		tests
web		web
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Introduction

Current Features

Installation

Prerequisites

Steps

Usage

Data Scraping

Pre-processing

Model Development

Inference Pipeline

Web Application

Local Development

Deployment

Testing

Contributing

About

Releases

Packages

Languages

6076paras/pitchProphet

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Introduction

Current Features

Installation

Prerequisites

Steps

Usage

Data Scraping

Pre-processing

Model Development

Inference Pipeline

Web Application

Local Development

Deployment

Testing

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages