Skip to content

Analyze social networks using spectral decomposition over time

License

Notifications You must be signed in to change notification settings

BioroboticsLab/bb_network_decomposition

Repository files navigation

Note: A rendered version of this markdown readme file can be found here: github.com/BioroboticsLab/bb_network_decomposition

Social networks predict the life and death of honey bees

Analyze social networks using spectral decomposition over time.

Preprint: DOI 10.1101/2020.05.06.076943 Data: DOI 10.5281/zenodo.4438013

Usage example

This sample code showcases how to load the raw input data, calculate network age, and fit and evaluate the multinomial task regression and supplementary regression models.

import datetime
import pandas as pd
import numpy as np

# https://github.com/BioroboticsLab/bb_network_decomposition

# the module can be installed using pip:
# $ pip3 install --user git+https://github.com/BioroboticsLab/bb_network_decomposition.git
# the dependencies should be installed automatically:
# https://github.com/BioroboticsLab/bb_network_decomposition/blob/master/requirements.txt
# please note that you may have to install the dependency bb_utils manually:
# $ pip3 install --user git+https://github.com/BioroboticsLab/bb_utils.git
import bb_network_decomposition
import bb_network_decomposition.data
import bb_network_decomposition.normalization
import bb_network_decomposition.spectral
import bb_network_decomposition.projection
import bb_network_decomposition.evaluation
# location of interaction network hdf5 file
raw_networks_path = "zenodo/interaction_networks_20160729to20160827.h5"

# location of bee metainfo (location descriptors, supplementary labels, ...)
supplementary_data_path = "zenodo/bee_daily_data.csv"

# location of results of bayesian lifetime model
alive_path = "zenodo/alive_bees_bayesian.csv"
# first date in the interaction tensor
# used to match interaction data with supplementary data (locations, etc.)
from_date = datetime.datetime(2016, 8, 12)

# number of days to use (incrase to reproduce paper results)
num_days = 1

# load interaction data
(
    interactions, # interaction tensor
    labels, # names of interaction modes (proximity, trophallaxis, etc.)
    bee_ids, # unique BeesBook IDs of the individuals
    bee_ages, # tensor with ages of individuals over time
) = bb_network_decomposition.data.load_networks_h5(raw_networks_path, 0, num_days)

alive_df = bb_network_decomposition.data.load_alive_data(alive_path, bee_ids)

num_days = interactions.shape[0]
num_entities = interactions.shape[1]

num_modes = len(labels)
# number of spectral factors per interaction mode
num_factors_per_mode = 8
alive_matrices = bb_network_decomposition.data.get_daily_alive_matrices(
    alive_df, num_days, num_entities, from_date
)
alive_matrices.shape
(1, 2010, 2010)

Boolean tensor containing lifetime data of every individual. Shape is Day x Inidividual x Individual.

If both individuals i and j were alive on day d, alive_matrices[d,i,j] is True.

interactions = bb_network_decomposition.normalization.rank_transform(
    interactions, alive_matrices
)
interactions.shape
(1, 2010, 2010, 9)

Interaction strenghts of individuals over time. Shape is Day x Individual x Individual x Interaction mode.

labels
['proximity_counts',
 'proximity_euclidean',
 'proximity_rbf',
 'velocity_pos_sum',
 'velocity_neg_sum',
 'velocity_pos_mean',
 'velocity_neg_mean',
 'trophallaxis_duration',
 'trophallaxis_counts']

List of interaction modes in the same order as stored in interactions.

(
    daily_factors,
    num_factors_by_mode,
) = bb_network_decomposition.spectral.decomposition_by_day(
    interactions, alive_matrices, num_factors_per_mode, num_jobs=4
)
daily_factors[0].shape
(2010, 104)

Spectral factors of interactions matrices over time before temporal alignment and CCA.

num_factors = daily_factors[0].shape[-1]
daily_factors_aligned = bb_network_decomposition.spectral.temporal_alignment(
    daily_factors, alive_matrices
)

Spectral factors of interactions matrices over time after temporal alignment without CCA projection.

factor_df = bb_network_decomposition.data.get_factor_dataframe(
    daily_factors_aligned, from_date, alive_df, bee_ids
)
factor_df
day date bee_id age f_0 f_1 f_2 f_3 f_4 f_5 ... f_94 f_95 f_96 f_97 f_98 f_99 f_100 f_101 f_102 f_103
0 0 2016-08-12 21 45 -0.0072421 0.00314685 -0.000766774 -0.00423605 0.00082973 -0.00254058 ... -0.000475807 -0.0140402 0.0030798 0.000438266 0.00107959 0.000104271 -0.000691795 0.000613057 -0.000109924 -0.000521939
1 0 2016-08-12 39 45 -0.00492436 -0.00218363 -0.00100829 -0.00218637 -0.00100059 0.00164908 ... -0.00290549 -0.0443946 0.00347342 0.000299858 0.00062553 0.000165619 -0.00041004 6.26682e-05 -0.000255383 -0.000434362
2 0 2016-08-12 59 45 -0.00546352 0.000335544 -0.00101128 -0.00361959 0.00488799 0.00168936 ... -0.000275847 -0.00485629 0.00108332 -0.000354953 0.00052248 0.00164634 -0.002049 0.00215299 -0.00409516 0.00478023
3 0 2016-08-12 178 44 -0.00129473 -0.00042854 -0.00443996 -0.00169472 -0.00609196 0.00286522 ... 0.00257447 0.00106701 0.00191582 -0.000499389 0.000659427 0.000257282 -0.00259264 -0.000812146 -0.000417269 0.00120451
4 0 2016-08-12 199 44 -0.00827621 0.00464029 -0.00073654 -0.00250572 0.0110911 0.00379146 ... 0.0169083 0.0118159 0.00324122 1.71778e-05 0.000502871 -0.0019061 -0.000956135 0.00131178 0.000335653 -0.000305509
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1198 0 2016-08-12 3004 1 0.00563097 0.00379659 -0.00161652 -0.000676541 0.00236571 -0.00542808 ... -0.00444412 0.00799384 -0.00153584 -0.00221796 0.000654807 0.00175826 0.00022228 0.00097945 -0.00237923 0.00154693
1199 0 2016-08-12 3005 1 0.00482178 0.00242558 -0.00278712 -0.000206929 0.00144706 -0.000719776 ... 0.00679757 -0.000338297 -0.00351584 0.00180167 0.000462584 0.00303921 0.00434775 0.00354968 -2.89882e-05 0.0117052
1200 0 2016-08-12 3006 1 0.006286 0.00478083 -0.00603946 0.00192633 0.00080181 -0.00546843 ... 0.00212394 -0.00339074 -0.00372017 -0.000963446 -1.50815e-05 0.00101539 -0.00130499 0.00107728 -0.00114766 -0.000493821
1201 0 2016-08-12 3007 1 0.00502065 0.00350009 0.00158003 -0.000819385 8.74538e-05 0.00113418 ... 0.000366625 0.00208452 -0.00438712 -0.000486356 0.00141115 0.00300085 -0.000855138 -0.00527713 -0.0038038 -0.00121933
1202 0 2016-08-12 3008 1 0.00536517 0.00204567 -0.00211659 -0.000753905 -0.000456456 0.00402603 ... 0.000891216 -0.00229696 -0.00692422 -0.00129927 -0.00251864 0.000189614 -0.00128221 -0.00567603 -0.00219876 0.000853091

1203 rows × 108 columns

Each f_n column corresponds to factor of the spectral decomposition of one interaction mode of the interaction matrix of one day.

# Load location data, because we need it to compute the CCA projection
loc_df = bb_network_decomposition.data.load_location_data(supplementary_data_path)
cca_factor_df, cca = bb_network_decomposition.projection.get_cca_projection(
    factor_df, loc_df, return_cca=True, num_components=3
)
cca_factor_df.sort_values("date", inplace=True)

cca_factor_df now contains the network age for all individuals on all dates in the dataset.

The column network_age contains the first dimension of network age (used throughout most of the paper), and the second and third dimensions are stored in the columns network_age_1 and network_age_2.

factor_df.to_csv("network_age_cca.csv")
# list of variables to use as predictors in task allocation regression tasks
variable_names = [
    ["age"],
    ["age", "network_age"],
    ["network_age"],
    ["network_age", "network_age_1"],
    ["network_age", "network_age_1", "network_age_2"],
]

# list of variables to use as dependent variables in regression tasks
targets = [bb_network_decomposition.constants.supplementary_labels] + list(
    map(lambda l: [l], bb_network_decomposition.constants.supplementary_labels)
)

target_cols = bb_network_decomposition.constants.supplementary_labels
# load all required supplementary data
sup_df = bb_network_decomposition.data.load_supplementary_data(
    supplementary_data_path,
    keepcols=bb_network_decomposition.constants.default_location_data_cols
    + bb_network_decomposition.constants.default_supplementary_data_cols
    + ["location_descriptor_count"],
)
location_cols = set(bb_network_decomposition.constants.location_labels).union(
    set(bb_network_decomposition.constants.location_cols)
)
# remove location data from network age dataframe so that we can safely merge in all
# supplementary data
cca_factor_df = cca_factor_df[
    [c for c in cca_factor_df.columns if c not in location_cols]
]
sup_df = bb_network_decomposition.data.merge_location_data(cca_factor_df, sup_df)
# regression tasks bootstrap
regression_results = bb_network_decomposition.evaluation.get_bootstrap_results(
    sup_df,
    variable_names,
    targets,
    regression=True,
    use_tqdm=True,
    num_bootstrap_samples=8,
)

These results correspond to section 5 of the manuscript: Network age predicts an individual's behavior and future role in the colony

# results of bootstrap analysis, grouped by dependent and independent variables, R^2 scores
regression_results.groupby(["predictors", "target"]).fitted_linear_r2.mean()
predictors                               target
age                                      circadian_rhythm                                          0.331907
                                         circadian_rhythm,days_left,velocity_day,velocity_night    0.172621
                                         days_left                                                 0.012523
                                         velocity_day                                              0.083986
                                         velocity_night                                            0.286554
age,network_age                          circadian_rhythm                                          0.403065
                                         circadian_rhythm,days_left,velocity_day,velocity_night    0.210112
                                         days_left                                                 0.015316
                                         velocity_day                                              0.095954
                                         velocity_night                                            0.292630
network_age                              circadian_rhythm                                          0.387444
                                         circadian_rhythm,days_left,velocity_day,velocity_night    0.199390
                                         days_left                                                 0.010289
                                         velocity_day                                              0.112123
                                         velocity_night                                            0.243061
network_age,network_age_1                circadian_rhythm                                          0.390725
                                         circadian_rhythm,days_left,velocity_day,velocity_night    0.227511
                                         days_left                                                 0.071366
                                         velocity_day                                              0.112483
                                         velocity_night                                            0.266663
network_age,network_age_1,network_age_2  circadian_rhythm                                          0.433800
                                         circadian_rhythm,days_left,velocity_day,velocity_night    0.226100
                                         days_left                                                 0.065899
                                         velocity_day                                              0.148080
                                         velocity_night                                            0.260883
Name: fitted_linear_r2, dtype: float64
# multinomial regression for task allocation task
regression_results = bb_network_decomposition.evaluation.get_bootstrap_results(
    sup_df, variable_names, regression=False, use_tqdm=True, num_bootstrap_samples=8,
)
# results of bootstrap analysis, grouped by dependent and independent variables, R_McF^2 scores
regression_results.groupby(["predictors", "target"]).rho_mcf_linear.mean()
predictors                               target
age                                      brood_area_total                                        0.546260
                                         dance_floor                                             0.417913
                                         dance_floor,honey_storage,brood_area_total,near_exit    0.415424
                                         honey_storage                                           0.026411
                                         near_exit                                               0.314711
age,network_age                          brood_area_total                                        0.584782
                                         dance_floor                                             0.512601
                                         dance_floor,honey_storage,brood_area_total,near_exit    0.475106
                                         honey_storage                                           0.051840
                                         near_exit                                               0.376673
network_age                              brood_area_total                                        0.555143
                                         dance_floor                                             0.477920
                                         dance_floor,honey_storage,brood_area_total,near_exit    0.443385
                                         honey_storage                                           0.003832
                                         near_exit                                               0.357418
network_age,network_age_1                brood_area_total                                        0.577571
                                         dance_floor                                             0.477131
                                         dance_floor,honey_storage,brood_area_total,near_exit    0.462893
                                         honey_storage                                           0.166705
                                         near_exit                                               0.386814
network_age,network_age_1,network_age_2  brood_area_total                                        0.575683
                                         dance_floor                                             0.499749
                                         dance_floor,honey_storage,brood_area_total,near_exit    0.475821
                                         honey_storage                                           0.160835
                                         near_exit                                               0.445180
Name: rho_mcf_linear, dtype: float64

These results correspond to section 3 of the manuscript: Network age correctly identifies task allocation

Citation

Social networks predict the life and death of honey bees
Benjamin Wild, David M Dormagen, Adrian Zachariae, Michael L Smith, Kirsten S Traynor, Dirk Brockmann, Iain D Couzin, Tim Landgraf
bioRxiv 2020.05.06.076943; doi: https://doi.org/10.1101/2020.05.06.076943

About

Analyze social networks using spectral decomposition over time

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages