A spatio-temporal modeling framework for large-scale migration forecasts based on static sensor network data. FluxRGNN is a recurrent graph neural network that is based on a generic mechanistic description of population-level movements on the Voronoi tessellation of sensor locations. Unlike previous approaches, this hybrid model capitalises on local associations between environmental conditions and migration intensity as well as on spatio-temporal dependencies inherent to the movement process.
Please note that a refactored and extended version of FluxRGNN is available on the branch nexrad_data, which will be merged soon into the main code base.
First, make sure you have conda installed.
To install all other dependencies and the FluxRGNN package itself, switch to the FluxRGNN directory and run:
bash install.sh
This will create a new conda environment called fluxrgnn
and will install the FluxRGNN package into this environment.
Later on, it is enough to activate the environment with
conda activate fluxrgnn
before getting started.
Note that after making changes to files in the fluxrgnn
directory, you need to reinstall the associated python package by running
python setup.py install
If you want to use your GPU, you may need to manually install a matching PyTorch version.
To recreate geographical visualisations from our paper, some additional packages are required. They can be installed by running
conda env update --name fluxrgnn --file plotting_environment.yml
To make the conda environment visible for the jupyter notebooks, run
python -m ipykernel install --user --name=fluxrgnn
To install additional packages required to run the radar data preprocessing (see below), run
conda env update --name fluxrgnn --file preprocessing_environment.yml
FluxRGNN makes use of hydra to create a hierarchical configuration which can be composed
dynamically and allows for overrides through the command line. Have a look at the scripts/config
folder to
get familiar with the structure of config files. The default settings correspond to the settings used in our
paper.
You can, for example, easily switch between data sets (here radar
and abm
), by simply adding datasource=radar
or
datasource=abm
to your command line when running one of the provided scripts. Similarly, you could change
the number of fully-connected layers used in FluxRGNN to, say, 3 by adding model.n_fc_layers=3
.
The FluxRGNN dataloader expects the preprocessed data (including environmental and sensor network data) to be in the following path:
FluxRGNN/data/preprocessed/{t_unit}_voronoi_ndummy={ndummy}/{datasource}/{season}/{year}
where t_unit
, ndummy
, datasource
, season
and year
can be specified in the hydra configuration files
in the scripts/conf
directory.
To reproduce the results from our paper, please download the preprocessed data here
To run the preprocessing of bird density and velocity data from
the European weather radar network yourself, you can use this code base. Follow the README to install the birds
python package in your fluxrgnn
conda environment and download the raw radar data. Then, from the FluxRGNN/scripts
directory, run
python run_preprocessing.py datasource=radar +raw_data_dir={path/to/downloaded/data}
If you would like to apply FluxRGNN to your own data, you need to generate the following files (for each season and year):
-
delaunay.gpickle
: graph structure underlying the Voronoi tessellation of radar locations (as a networkx.DiGraph where nodes represent radars and edges between radars exist if their Voronoi cells are adjacent). You can use this code base to construct the voronoi tessellation and associated graph structure from a set of sensor locations. -
static_features.csv
: dataframe containing the following static features of radars and their corresponding Voronoi cell (as columns):description data type radar name/label of radar string observed true if data is available for this radar, false otherwise boolean x x-component of radar location in local coordinate reference system float y y-component of radar location in local coordinate reference system float lon longitude of radar location float lat latitude of radar location float boundary True if Voronoi cell lies at the boundary of the spatial domain, False otherwise boolean area_km2 area of Voronoi cell in km^2 float Note that the order of the rows in the data frame (representing the different radars) must correspond to the order of nodes in the
networkx.DiGraph
. -
dynamic_features.csv
: dataframe containing the following dynamic features of Voronoi cells, i.e. variables that change over time (as columns:description data type radar name/label of radar string night true if at any point during the time step the sun angle is below -6 degrees, false otherwise boolean dusk true if at any point during the time step the sun angle drops below 6 degrees, false otherwise boolean dawn true if at any point during the time step the sun angle rises above 6 degrees, false otherwise boolean datetime timestamp defining the beginning of the time step (e.g. "2015-08-01 12:00:00+00:00") string dayofyear day of the year (determined based on the beginning of the time step) int tidx time index used for indexing, sorting and aligning data sequences of multiple radars int nightID night identifier used to group data belonging to the same night int birds_km2 bird density (birds/km^2) in the Voronoi cell measured by the radar float birds total number of birds in the Voronoi cell (derived from bird density) float bird_u u-component of the bird velocity measured by the radar float bird_v v-component of the bird velocity measured by the radar float bird_speed bird speed measured by the radar float bird_direction bird direction measured by the radar float missing true if data is missing, false otherwise boolean ... any relevant environmental variables can be added here. The variable names should correspond to those specified in the env_vars list in the datasource config file.
To train FluxRGNN on all available data except for year 2017 and to immediately test it on the held-out data, switch to the scripts
directory and run
python run_experiments.py datasource={datasource} +experiment={name}
with datasource
being either radar
or abm
, and name
being any identifier you would like to give
your experiment (used to name the directory to which all results of this experiment are written to). To change the test year to X
, add datasource.test_year=X
as a command line argument.
To run the same on a cluster using slurm and cuda, with 5 instances of FluxRGNN being trained in parallel, run
python run_experiments.py datasource={datasource} +experiment={name} device=cluster task.repeats=5
To train and evaluate one of the baseline models (model = HA, GAM, or GBT
), simply add model={model}
to your command line.
-
FluxRGNN:
python run_experiments.py datasource=radar +experiment=final task.repeats=5
To run the same for the simulated data, replace
radar
byabm
and addmodel.lr=1e-5 model.batch_size=4 datasource.n_dummy_radars=25
to the command line. -
FluxRGNN w/o encoder:
python run_experiments.py datasource=radar +experiment=final_without_encoder task.repeats=5 model.use_encoder=false model.use_uv=false
To run the same for the simulated data, replace
radar
byabm
and addmodel.lr=1e-5 model.batch_size=4 datasource.n_dummy_radars=25
to the command line. -
FluxRGNN w/o boundary cells:
python run_experiments.py datasource=radar +experiment=final_without_boundary task.repeats=5 model.use_boundary_model=false datasource.n_dummy_radars=0
To run the same for the simulated data, replace
radar
byabm
and addmodel.lr=1e-5 model.batch_size=4
to the command line. -
FluxRGNN w/o spatial fluxes:
python run_experiments.py datasource=radar model=LocalLSTM +experiment=final task.repeats=5
To run the same for the simulated data, replace
radar
byabm
and addmodel.lr=1e-5 model.batch_size=4 datasource.n_dummy_radars=25
to the command line. -
GBT:
python run_experiments.py datasource=radar model=GBT +experiment=final task.repeats=5 datasource.n_dummy_radars=0
To run the same for the simulated data, replace
radar
byabm
and addmodel.max_depth=10
to the command line. -
GAM:
python run_experiments.py datasource=radar model=GAM +experiment=final task.repeats=1 datasource.n_dummy_radars=0
To run the same for the simulated data, replace
radar
byabm
. -
HA:
python run_experiments.py datasource=radar model=HA +experiment=final task.repeats=1 datasource.n_dummy_radars=0
To run the same for the simulated data, replace
radar
byabm
.
To evaluate the predictive performance of FluxRGNN (or any of the other models), run
python evaluate_performance.py datasource={datasource} model={model} task.repeats={repeats}
This will generate summaries of the performance measures of all experiments available for this model and write the output to results/{datasource}/performance_evaluation/{model}_only
.
Then the Jupyter notebook inspect_results.ipynb
can be used to visualize the performance metrics, and to inspect example predictions for individual radars.
To compare the predictive performance of FluxRGNN to the baseline models, run
python evaluate_performance.py datasource={datasource} +experiment_type=final task.repeats=5
This will generate summaries of the performance measures of all experiments called 'final' for models FluxRGNN
, GAM
, HA
, and GBT
, and write the output to results/{datasource}/performance_evaluation/final
.
Similarly, to compare the predictive performance of FluxRGNN to its variants (ablations), run
python evaluate_performance.py datasource={datasource} +experiment_type=ablations task.repeats=5
This will generate summaries of the performance measures of all ablation experiments and write the output to results/{datasource}/performance_evaluation/ablations
.
Then the Jupyter notebook performance_evaluation.ipynb
can be used to recreate the figures from our paper.
To validate the spatial and temporal component of FluxRGNN by comparing nightly fluxes and source/sink terms to the respective ground truth from simulations, run
python evaluate_fluxes.py datasource=abm fixed_t0=true
To do the same for hourly fluxes and source/sink terms, run
python evaluate_fluxes.py datasource=abm fixed_t0=false +H_min=24 +H_max=24
The forecasting horizon (H_min
and H_max
) can be set to anything between 1 and 72.
To recreate the figures from our paper, use the Jupyter notebook validation_study.ipynb
.
To recreate the map of average nightly fluxes predicted for the radar data, first run
python evaluate_fluxes.py datasource=radar fixed_t0=true
and then use the Jupyter notebook radar_case_study.ipynb
for plotting.
The same notebook can be used to visualize example predictions for a single radar or the entire network.