Skip to content

Commit

Permalink
Colab Notebook development (#137)
Browse files Browse the repository at this point in the history
* Add ACSIncome dataset

* Add option for url as path to datasets

* Add missing dependency

* Fix dataset loading step

* debug

* Revert number of parent files

* Make paths related to aequitas package.

* Update orchestrator to use dict configuration

* Update dependencies

* Add configurations

* Add image

* Update example configurations

* Add identity as default pre and post processing methods

* Add default orchestrator, where users only pass dataset parameters

* Add experiment results

* update pickles

* Add download method

* Fix download method

* Add missing variable

* Change download to load_data method

* Change path definition location

* Add image for notebooks

* Add handler cleaning method

* Add logging to generic dataset

* Add image of methods for notebooks

* Add other variants of folktables

* Update folktables for more intuitive use

* Fix bug in test dataset

* Add generic dataset definition

* Update generic dataset, add logs and sort imports

* Fix super call

* Remove warnings from LabeledFrame

* Add method to read dataset configuration

* Update imports

* Correct to static method

* Update BAF to use dataset object

* Add more logging messages and remove lgbm defaults

* Update verbose variable to integer

* Update on methods (consistency of abstract class)

* Update method definition

* Add method to read methods in orchestrator

* Fix orchestrator type hints

* Update notebook images

* update config structure

* Change orchestrator to experiment

* Add ACS Income smaller sample

* Fix missing target

* Update non FairML models to base_estimators

* Add artifact handling utilities

* Update default experiment for more configurations

* config changes

* Add splitting strategies to generic dataset

* Update default to pass dataset as list.

* Fix passing wrong variable to parent class

* Updating logs of experiment

* Add to setup files of pareto plot

* Fix setup.py

* Update bootstrap plot

* Update pareto visualization wrapper for easier use

* Add data repair example

* Add helper method to download objects from repo

* Add logging to colab utils

* update data repair example

* Update notebook diagram

* Add image to contribution notebook

* Reduce configuration for faster results

* Remove unnecessary logs

* Update colab docstring

* Update examples structure

* Fix examples path

* Update utils related to results

* Update import structure with isort

* Black reformat

* add methods to utils init.

* Add pareto wrapper to init

* rename arg

* fix method

* Remove iFrame from visualization

* Fix import

* Change calls to only one class

* Update plot init

* Add display to plot

* add return to visualize method

* Sort results by id

* Add predictions to examples

* Update results

* Change visualize of bootstrap plot

* Add metrics to plot

* Add bias audit method

* Sort results

* fix results ordering

* Cast id to str

* Fix audit plot

* Remove unnecessary prints

* Add pareto only check to visualize

* Update plots

* add results for pareto models

* Update researcher plot

* Update bootstraping input parameters

* Add minimum value to n_models_to_sample

* Remove unnecessary line

* Improve the code performance and break logic in blocks

* Small corrections

* Improved performance of bootstrap + bug solving

* Correct formatting errors

* Include plotting to the group class (might be changed to the bias class also)

* Add accuracy to group metrics

* Add example dataset

* Correct generic dataset

* Solve _data access

* fix property type

* Update datasets examples

* Make categorical features non-mandatory

* Add default_experiment to experiment subpackage

* Correct demographic parity metric

* Add new header
  • Loading branch information
sgpjesus authored Dec 19, 2023
1 parent a0012df commit 8ab35e6
Show file tree
Hide file tree
Showing 259 changed files with 3,469 additions and 646 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ __pycache__/

*.DS_Store*

artifacts/

# Elastic Beanstalk Files
.elasticbeanstalk/*
Expand Down
Binary file added datasets/FolkTables/ACSEmployment.test.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added datasets/FolkTables/ACSIncome.test.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSIncome.train.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSIncome.validation.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSMobility.test.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSMobility.train.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added datasets/FolkTables/ACSTravelTime.test.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSTravelTime.train.parquet
Binary file not shown.
Binary file not shown.
Empty file added examples/__init__.py
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsincome:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PINCP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSIncome (Sample)
15 changes: 15 additions & 0 deletions examples/configs/notebook_configs/experiment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
methods:
- lightgbm_baseline
- prevalence_undersampling
- fairgbm_folktables
- group_threshold_folktables

datasets:
- FolkTables_ACSIncome

optimization:
n_trials: 100
n_jobs: 1
sampler: RandomSampler
sampler_args:
seed: 42
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fairgbm_folktables:
type: inprocessing

defaults:
- fairgbm_folktables/preprocessing: identity
- fairgbm_folktables/inprocessing: fairgbm
- fairgbm_folktables/postprocessing: identity
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
fairgbm:
classpath: aequitas.fairflow.methods.inprocessing.fairgbm.FairGBM
args:
constraint_type:
- fnr

constraint_fnr_threshold:
- 0

proxy_margin:
- 1

multiplier_learning_rate:
type: float
range: [0.01, 1.0]
log: True

constraint_stepwise_proxy:
- cross_entropy

boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.postprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.preprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
group_threshold_folktables:
type: postprocessing

defaults:
- group_threshold_folktables/preprocessing: identity
- group_threshold_folktables/inprocessing: lightgbm
- group_threshold_folktables/postprocessing: group_threshold
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
lightgbm:
classpath: aequitas.fairflow.methods.inprocessing.lightgbm.LightGBM
args:
boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
group_thresholding:
classpath: aequitas.fairflow.methods.postprocessing.group_threshold.GroupThreshold
args:
threshold_type: tpr
threshold_value: 0.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.preprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
lightgbm_baseline:
type: inprocessing

defaults:
- lightgbm_baseline/preprocessing: identity
- lightgbm_baseline/inprocessing: lightgbm
- lightgbm_baseline/postprocessing: identity
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
lightgbm:
classpath: aequitas.fairflow.methods.inprocessing.lightgbm.LightGBM
args:
boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.postprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.preprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
prevalence_undersampling:
type: preprocessing

defaults:
- prevalence_undersampling/preprocessing: prevalence_sampling
- prevalence_undersampling/inprocessing: lightgbm
- prevalence_undersampling/postprocessing: identity
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
lightgbm:
classpath: aequitas.fairflow.methods.inprocessing.lightgbm.LightGBM
args:
boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.postprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
prevalence_sampling:
args:
alpha:
type: float
range: [0.5, 1]
classpath: aequitas.fairflow.methods.preprocessing.prevalence_sample.PrevalenceSampling
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsemployment:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: ESR
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSEmployment
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsincome:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PINCP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSIncome
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsincome:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PINCP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSIncome
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsmobility:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: MIG
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSMobility
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acspubliccoverage:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PUBCOV
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSPublicCoverage
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acstraveltime:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: JWMNP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSTravelTime
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_base:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: Base
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_1:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeI
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_2:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeII
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_3:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeIII
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_4:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeIV
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_5:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeV
23 changes: 23 additions & 0 deletions examples/configs/paper_configs/experiment_baf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
methods:
- lightgbm_baseline
- prevalence_undersampling
- prevalence_oversampling
- fairgbm_baf
- group_threshold_baf
- grid_search_baf
- exponentiated_gradient_baf

datasets:
- baf_base
- baf_variant_1
- baf_variant_2
- baf_variant_3
- baf_variant_4
- baf_variant_5

optimization:
n_trials: 100
n_jobs: 1
sampler: RandomSampler
sampler_args:
seed: 42
Loading

0 comments on commit 8ab35e6

Please sign in to comment.