Repository of the paper Do Graph Neural Networks Build Fair User Models? Assessing Disparate Impact and Mistreatment in Behavioural User Profiling by Erasmo Purificato, Ludovico Boratto and Ernesto William De Luca.
Recent approaches to behavioural user profiling employ Graph Neural Networks (GNNs) to turn users' interactions with a platform into actionable knowledge. The effectiveness of an approach is usually assessed with accuracy-based perspectives, where the capability to predict user features (such as the purchasing level or the age) is evaluated. In this work, we perform a beyond-accuracy analysis of the state-of-the-art approaches to assess the presence of disparate impact and disparate mistreatment, meaning that users characterised by a given sensitive feature are unintentionally, but systematically, classified worse than their counterparts. Our analysis on two-real world datasets shows that different user profiling paradigms can impact on fairness results.
The code has been executed under Python 3.8.1, with the dependencies listed below.
metis==0.2a5
networkx==2.6.3
numpy==1.22.0
pandas==1.3.5
scikit_learn==1.0.2
scipy==1.7.3
texttable==1.6.4
torch==1.10.1+cu113
torch_geometric==2.0.3
torch_scatter==2.0.9
tqdm==4.62.3
dgl==0.6.1
dgl_cu113==0.7.2
fasttext==0.9.2
fitlog==0.9.13
hickle==4.0.4
matplotlib==3.5.1
numpy==1.22.0
pandas==1.3.5
scikit_learn==1.0.2
scipy==1.7.3
torch==1.10.1+cu113
tqdm==4.62.3
Notes:
- the file
requirements.txt
installs all dependencies for both models; - the dependencies including
cu113
are meant to run on CUDA 11.3 (install the correct package based on your version of CUDA).
The preprocessed files required for running each model are included as a zip file within the related folder.
The raw datasets are available at:
Test runs for each combination of model-dataset.
$ cd CatGCN
$ python3 main.py --seed 11 --gpu 0 --learning-rate 0.1 --weight-decay 1e-5 \
--dropout 0.1 --diag-probe 1 --graph-refining agc --aggr-pooling mean --grn-units 64 \
--bi-interaction nfm --nfm-units none --graph-layer pna --gnn-hops 1 --gnn-units none \
--aggr-style sum --balance-ratio 0.7 --edge-path ./input/ali_data/user_edge.csv \
--field-path ./input_ali_data/user_field.npy --target-path ./input_ali_data/user_gender.csv \
--labels-path ./input_ali_data/user_labels.csv --sens-attr bin_age --label gender
$ cd CatGCN
$ python3 main.py --seed 11 --gpu 0 --learning-rate 1e-2 --weight-decay 1e-5 \
--dropout 0.1 --diag-probe 39 --graph-refining agc --aggr-pooling mean --grn-units 64 \
--bi-interaction nfm --nfm-units none --graph-layer pna --gnn-hops 1 --gnn-units none \
--aggr-style sum --balance-ratio 0.7 --edge-path ./input_jd_data/user_edge.csv \
--field-path ./input_jd_data/user_field.npy --target-path ./input_jd_data/user_gender.csv \
--labels-path ./input_jd_data/user_labels.csv --sens-attr bin_age --label gender
$ cd RHGN
$ python3 ali_main.py --seed 42 --gpu 0 --model RHGN --data_dir ./input_ali_data/ \
--graph G_new --max_lr 0.1 --n_hid 32 --clip 2 --n_epoch 100 \
--label gender --sens_attr bin_age
$ cd RHGN
$ python3 jd_main.py --seed 3 --gpu 0 --model RHGN --data_dir ./input_jd_data/ \
--graph G_new --max_lr 1e-3 --n_hid 64 --clip 1 --n_epoch 100 \
--label gender --sens_attr bin_age
Erasmo Purificato (erasmo.purificato@ovgu.de)