Note: A rendered version of this markdown readme file can be found here: github.com/BioroboticsLab/bb_network_decomposition
Analyze social networks using spectral decomposition over time.
Preprint: DOI 10.1101/2020.05.06.076943 Data: DOI 10.5281/zenodo.4438013
This sample code showcases how to load the raw input data, calculate network age, and fit and evaluate the multinomial task regression and supplementary regression models.
import datetime
import pandas as pd
import numpy as np
# https://github.com/BioroboticsLab/bb_network_decomposition
# the module can be installed using pip:
# $ pip3 install --user git+https://github.com/BioroboticsLab/bb_network_decomposition.git
# the dependencies should be installed automatically:
# https://github.com/BioroboticsLab/bb_network_decomposition/blob/master/requirements.txt
# please note that you may have to install the dependency bb_utils manually:
# $ pip3 install --user git+https://github.com/BioroboticsLab/bb_utils.git
import bb_network_decomposition
import bb_network_decomposition.data
import bb_network_decomposition.normalization
import bb_network_decomposition.spectral
import bb_network_decomposition.projection
import bb_network_decomposition.evaluation
# location of interaction network hdf5 file
raw_networks_path = "zenodo/interaction_networks_20160729to20160827.h5"
# location of bee metainfo (location descriptors, supplementary labels, ...)
supplementary_data_path = "zenodo/bee_daily_data.csv"
# location of results of bayesian lifetime model
alive_path = "zenodo/alive_bees_bayesian.csv"
# first date in the interaction tensor
# used to match interaction data with supplementary data (locations, etc.)
from_date = datetime.datetime(2016, 8, 12)
# number of days to use (incrase to reproduce paper results)
num_days = 1
# load interaction data
(
interactions, # interaction tensor
labels, # names of interaction modes (proximity, trophallaxis, etc.)
bee_ids, # unique BeesBook IDs of the individuals
bee_ages, # tensor with ages of individuals over time
) = bb_network_decomposition.data.load_networks_h5(raw_networks_path, 0, num_days)
alive_df = bb_network_decomposition.data.load_alive_data(alive_path, bee_ids)
num_days = interactions.shape[0]
num_entities = interactions.shape[1]
num_modes = len(labels)
# number of spectral factors per interaction mode
num_factors_per_mode = 8
alive_matrices = bb_network_decomposition.data.get_daily_alive_matrices(
alive_df, num_days, num_entities, from_date
)
alive_matrices.shape
(1, 2010, 2010)
Boolean tensor containing lifetime data of every individual. Shape is Day x Inidividual x Individual.
If both individuals i and j were alive on day d, alive_matrices[d,i,j] is True.
interactions = bb_network_decomposition.normalization.rank_transform(
interactions, alive_matrices
)
interactions.shape
(1, 2010, 2010, 9)
Interaction strenghts of individuals over time. Shape is Day x Individual x Individual x Interaction mode.
labels
['proximity_counts',
'proximity_euclidean',
'proximity_rbf',
'velocity_pos_sum',
'velocity_neg_sum',
'velocity_pos_mean',
'velocity_neg_mean',
'trophallaxis_duration',
'trophallaxis_counts']
List of interaction modes in the same order as stored in interactions.
(
daily_factors,
num_factors_by_mode,
) = bb_network_decomposition.spectral.decomposition_by_day(
interactions, alive_matrices, num_factors_per_mode, num_jobs=4
)
daily_factors[0].shape
(2010, 104)
Spectral factors of interactions matrices over time before temporal alignment and CCA.
num_factors = daily_factors[0].shape[-1]
daily_factors_aligned = bb_network_decomposition.spectral.temporal_alignment(
daily_factors, alive_matrices
)
Spectral factors of interactions matrices over time after temporal alignment without CCA projection.
factor_df = bb_network_decomposition.data.get_factor_dataframe(
daily_factors_aligned, from_date, alive_df, bee_ids
)
factor_df
day | date | bee_id | age | f_0 | f_1 | f_2 | f_3 | f_4 | f_5 | ... | f_94 | f_95 | f_96 | f_97 | f_98 | f_99 | f_100 | f_101 | f_102 | f_103 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2016-08-12 | 21 | 45 | -0.0072421 | 0.00314685 | -0.000766774 | -0.00423605 | 0.00082973 | -0.00254058 | ... | -0.000475807 | -0.0140402 | 0.0030798 | 0.000438266 | 0.00107959 | 0.000104271 | -0.000691795 | 0.000613057 | -0.000109924 | -0.000521939 |
1 | 0 | 2016-08-12 | 39 | 45 | -0.00492436 | -0.00218363 | -0.00100829 | -0.00218637 | -0.00100059 | 0.00164908 | ... | -0.00290549 | -0.0443946 | 0.00347342 | 0.000299858 | 0.00062553 | 0.000165619 | -0.00041004 | 6.26682e-05 | -0.000255383 | -0.000434362 |
2 | 0 | 2016-08-12 | 59 | 45 | -0.00546352 | 0.000335544 | -0.00101128 | -0.00361959 | 0.00488799 | 0.00168936 | ... | -0.000275847 | -0.00485629 | 0.00108332 | -0.000354953 | 0.00052248 | 0.00164634 | -0.002049 | 0.00215299 | -0.00409516 | 0.00478023 |
3 | 0 | 2016-08-12 | 178 | 44 | -0.00129473 | -0.00042854 | -0.00443996 | -0.00169472 | -0.00609196 | 0.00286522 | ... | 0.00257447 | 0.00106701 | 0.00191582 | -0.000499389 | 0.000659427 | 0.000257282 | -0.00259264 | -0.000812146 | -0.000417269 | 0.00120451 |
4 | 0 | 2016-08-12 | 199 | 44 | -0.00827621 | 0.00464029 | -0.00073654 | -0.00250572 | 0.0110911 | 0.00379146 | ... | 0.0169083 | 0.0118159 | 0.00324122 | 1.71778e-05 | 0.000502871 | -0.0019061 | -0.000956135 | 0.00131178 | 0.000335653 | -0.000305509 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1198 | 0 | 2016-08-12 | 3004 | 1 | 0.00563097 | 0.00379659 | -0.00161652 | -0.000676541 | 0.00236571 | -0.00542808 | ... | -0.00444412 | 0.00799384 | -0.00153584 | -0.00221796 | 0.000654807 | 0.00175826 | 0.00022228 | 0.00097945 | -0.00237923 | 0.00154693 |
1199 | 0 | 2016-08-12 | 3005 | 1 | 0.00482178 | 0.00242558 | -0.00278712 | -0.000206929 | 0.00144706 | -0.000719776 | ... | 0.00679757 | -0.000338297 | -0.00351584 | 0.00180167 | 0.000462584 | 0.00303921 | 0.00434775 | 0.00354968 | -2.89882e-05 | 0.0117052 |
1200 | 0 | 2016-08-12 | 3006 | 1 | 0.006286 | 0.00478083 | -0.00603946 | 0.00192633 | 0.00080181 | -0.00546843 | ... | 0.00212394 | -0.00339074 | -0.00372017 | -0.000963446 | -1.50815e-05 | 0.00101539 | -0.00130499 | 0.00107728 | -0.00114766 | -0.000493821 |
1201 | 0 | 2016-08-12 | 3007 | 1 | 0.00502065 | 0.00350009 | 0.00158003 | -0.000819385 | 8.74538e-05 | 0.00113418 | ... | 0.000366625 | 0.00208452 | -0.00438712 | -0.000486356 | 0.00141115 | 0.00300085 | -0.000855138 | -0.00527713 | -0.0038038 | -0.00121933 |
1202 | 0 | 2016-08-12 | 3008 | 1 | 0.00536517 | 0.00204567 | -0.00211659 | -0.000753905 | -0.000456456 | 0.00402603 | ... | 0.000891216 | -0.00229696 | -0.00692422 | -0.00129927 | -0.00251864 | 0.000189614 | -0.00128221 | -0.00567603 | -0.00219876 | 0.000853091 |
1203 rows × 108 columns
Each f_n column corresponds to factor of the spectral decomposition of one interaction mode of the interaction matrix of one day.
# Load location data, because we need it to compute the CCA projection
loc_df = bb_network_decomposition.data.load_location_data(supplementary_data_path)
cca_factor_df, cca = bb_network_decomposition.projection.get_cca_projection(
factor_df, loc_df, return_cca=True, num_components=3
)
cca_factor_df.sort_values("date", inplace=True)
cca_factor_df now contains the network age for all individuals on all dates in the dataset.
The column network_age contains the first dimension of network age (used throughout most of the paper), and the second and third dimensions are stored in the columns network_age_1 and network_age_2.
factor_df.to_csv("network_age_cca.csv")
# list of variables to use as predictors in task allocation regression tasks
variable_names = [
["age"],
["age", "network_age"],
["network_age"],
["network_age", "network_age_1"],
["network_age", "network_age_1", "network_age_2"],
]
# list of variables to use as dependent variables in regression tasks
targets = [bb_network_decomposition.constants.supplementary_labels] + list(
map(lambda l: [l], bb_network_decomposition.constants.supplementary_labels)
)
target_cols = bb_network_decomposition.constants.supplementary_labels
# load all required supplementary data
sup_df = bb_network_decomposition.data.load_supplementary_data(
supplementary_data_path,
keepcols=bb_network_decomposition.constants.default_location_data_cols
+ bb_network_decomposition.constants.default_supplementary_data_cols
+ ["location_descriptor_count"],
)
location_cols = set(bb_network_decomposition.constants.location_labels).union(
set(bb_network_decomposition.constants.location_cols)
)
# remove location data from network age dataframe so that we can safely merge in all
# supplementary data
cca_factor_df = cca_factor_df[
[c for c in cca_factor_df.columns if c not in location_cols]
]
sup_df = bb_network_decomposition.data.merge_location_data(cca_factor_df, sup_df)
# regression tasks bootstrap
regression_results = bb_network_decomposition.evaluation.get_bootstrap_results(
sup_df,
variable_names,
targets,
regression=True,
use_tqdm=True,
num_bootstrap_samples=8,
)
These results correspond to section 5 of the manuscript: Network age predicts an individual's behavior and future role in the colony
# results of bootstrap analysis, grouped by dependent and independent variables, R^2 scores
regression_results.groupby(["predictors", "target"]).fitted_linear_r2.mean()
predictors target
age circadian_rhythm 0.331907
circadian_rhythm,days_left,velocity_day,velocity_night 0.172621
days_left 0.012523
velocity_day 0.083986
velocity_night 0.286554
age,network_age circadian_rhythm 0.403065
circadian_rhythm,days_left,velocity_day,velocity_night 0.210112
days_left 0.015316
velocity_day 0.095954
velocity_night 0.292630
network_age circadian_rhythm 0.387444
circadian_rhythm,days_left,velocity_day,velocity_night 0.199390
days_left 0.010289
velocity_day 0.112123
velocity_night 0.243061
network_age,network_age_1 circadian_rhythm 0.390725
circadian_rhythm,days_left,velocity_day,velocity_night 0.227511
days_left 0.071366
velocity_day 0.112483
velocity_night 0.266663
network_age,network_age_1,network_age_2 circadian_rhythm 0.433800
circadian_rhythm,days_left,velocity_day,velocity_night 0.226100
days_left 0.065899
velocity_day 0.148080
velocity_night 0.260883
Name: fitted_linear_r2, dtype: float64
# multinomial regression for task allocation task
regression_results = bb_network_decomposition.evaluation.get_bootstrap_results(
sup_df, variable_names, regression=False, use_tqdm=True, num_bootstrap_samples=8,
)
# results of bootstrap analysis, grouped by dependent and independent variables, R_McF^2 scores
regression_results.groupby(["predictors", "target"]).rho_mcf_linear.mean()
predictors target
age brood_area_total 0.546260
dance_floor 0.417913
dance_floor,honey_storage,brood_area_total,near_exit 0.415424
honey_storage 0.026411
near_exit 0.314711
age,network_age brood_area_total 0.584782
dance_floor 0.512601
dance_floor,honey_storage,brood_area_total,near_exit 0.475106
honey_storage 0.051840
near_exit 0.376673
network_age brood_area_total 0.555143
dance_floor 0.477920
dance_floor,honey_storage,brood_area_total,near_exit 0.443385
honey_storage 0.003832
near_exit 0.357418
network_age,network_age_1 brood_area_total 0.577571
dance_floor 0.477131
dance_floor,honey_storage,brood_area_total,near_exit 0.462893
honey_storage 0.166705
near_exit 0.386814
network_age,network_age_1,network_age_2 brood_area_total 0.575683
dance_floor 0.499749
dance_floor,honey_storage,brood_area_total,near_exit 0.475821
honey_storage 0.160835
near_exit 0.445180
Name: rho_mcf_linear, dtype: float64
These results correspond to section 3 of the manuscript: Network age correctly identifies task allocation
Social networks predict the life and death of honey bees
Benjamin Wild, David M Dormagen, Adrian Zachariae, Michael L Smith, Kirsten S Traynor, Dirk Brockmann, Iain D Couzin, Tim Landgraf
bioRxiv 2020.05.06.076943; doi: https://doi.org/10.1101/2020.05.06.076943