Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab Notebook development #137

Merged
merged 116 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
2f4c1d0
Add ACSIncome dataset
sgpjesus Sep 5, 2023
b1c098e
Add option for url as path to datasets
sgpjesus Sep 6, 2023
870d96e
Add missing dependency
sgpjesus Sep 6, 2023
1454720
Fix dataset loading step
sgpjesus Sep 6, 2023
0d63cf3
debug
sgpjesus Sep 6, 2023
0e684e2
Revert number of parent files
sgpjesus Sep 6, 2023
e21561b
Make paths related to aequitas package.
sgpjesus Sep 6, 2023
9e44ada
Update orchestrator to use dict configuration
sgpjesus Sep 6, 2023
92e2f82
Update dependencies
sgpjesus Sep 6, 2023
97fe8a7
Add configurations
sgpjesus Sep 7, 2023
b6068d9
Add image
sgpjesus Sep 7, 2023
929b50a
Update example configurations
sgpjesus Sep 11, 2023
d33bbba
Add identity as default pre and post processing methods
sgpjesus Sep 13, 2023
eeffa6f
Add default orchestrator, where users only pass dataset parameters
sgpjesus Sep 21, 2023
d79d06d
Add experiment results
sgpjesus Oct 7, 2023
b7ad2bd
update pickles
sgpjesus Oct 7, 2023
0d9d448
Add download method
sgpjesus Oct 9, 2023
cbcdeeb
Fix download method
sgpjesus Oct 9, 2023
12b28d2
Add missing variable
sgpjesus Oct 9, 2023
e92312c
Change download to load_data method
sgpjesus Oct 9, 2023
6d70888
Change path definition location
sgpjesus Oct 9, 2023
53fa8e1
Add image for notebooks
sgpjesus Oct 11, 2023
a24833d
Add handler cleaning method
sgpjesus Oct 11, 2023
8ad135c
Add logging to generic dataset
sgpjesus Oct 11, 2023
e06a43f
Add image of methods for notebooks
sgpjesus Oct 11, 2023
4f341d6
Add other variants of folktables
sgpjesus Oct 11, 2023
0907c61
Update folktables for more intuitive use
sgpjesus Oct 11, 2023
24aa57d
Fix bug in test dataset
sgpjesus Oct 11, 2023
d98b47a
Add generic dataset definition
sgpjesus Oct 12, 2023
4fd074f
Update generic dataset, add logs and sort imports
sgpjesus Oct 12, 2023
ccaf1e3
Fix super call
sgpjesus Oct 12, 2023
e1eab71
Remove warnings from LabeledFrame
sgpjesus Oct 12, 2023
0b8bb24
Add method to read dataset configuration
sgpjesus Oct 12, 2023
d5242ee
Update imports
sgpjesus Oct 12, 2023
1377fce
Correct to static method
sgpjesus Oct 12, 2023
99b6d4d
Update BAF to use dataset object
sgpjesus Oct 12, 2023
0b5c8bd
Add more logging messages and remove lgbm defaults
sgpjesus Oct 12, 2023
b3de679
Update verbose variable to integer
sgpjesus Oct 12, 2023
40ffe5f
Update on methods (consistency of abstract class)
sgpjesus Oct 13, 2023
07c2608
Update method definition
sgpjesus Oct 13, 2023
4beb381
Add method to read methods in orchestrator
sgpjesus Oct 13, 2023
db479b0
Fix orchestrator type hints
sgpjesus Oct 16, 2023
ea488b9
Update notebook images
sgpjesus Oct 19, 2023
ca4e808
update config structure
sgpjesus Oct 19, 2023
b4c5b9c
Change orchestrator to experiment
sgpjesus Oct 19, 2023
c1ceedc
Add ACS Income smaller sample
sgpjesus Oct 19, 2023
57bd399
Fix missing target
sgpjesus Oct 19, 2023
7541e05
Update non FairML models to base_estimators
sgpjesus Oct 26, 2023
bcd4616
Add artifact handling utilities
sgpjesus Oct 26, 2023
f809d4d
Update default experiment for more configurations
sgpjesus Oct 31, 2023
b682bfe
config changes
sgpjesus Oct 31, 2023
2c9d9ef
Add splitting strategies to generic dataset
fdz-sergio-jesus Nov 2, 2023
e57a2fb
Update default to pass dataset as list.
fdz-sergio-jesus Nov 3, 2023
0c915a2
Fix passing wrong variable to parent class
fdz-sergio-jesus Nov 3, 2023
250c80f
Updating logs of experiment
fdz-sergio-jesus Nov 3, 2023
662372e
Add to setup files of pareto plot
fdz-sergio-jesus Nov 3, 2023
cd504b5
Fix setup.py
fdz-sergio-jesus Nov 3, 2023
43d51c5
Update bootstrap plot
fdz-sergio-jesus Nov 3, 2023
b7bd77e
Update pareto visualization wrapper for easier use
fdz-sergio-jesus Nov 8, 2023
8041056
Add data repair example
fdz-sergio-jesus Nov 8, 2023
f82e10a
Add helper method to download objects from repo
fdz-sergio-jesus Nov 9, 2023
6608160
Add logging to colab utils
fdz-sergio-jesus Nov 9, 2023
cadd286
update data repair example
fdz-sergio-jesus Nov 9, 2023
bdd3b62
Update notebook diagram
fdz-sergio-jesus Nov 9, 2023
4b3fece
Add image to contribution notebook
fdz-sergio-jesus Nov 9, 2023
b40a779
Reduce configuration for faster results
fdz-sergio-jesus Nov 9, 2023
60c8e70
Remove unnecessary logs
fdz-sergio-jesus Nov 9, 2023
fad8b9b
Update colab docstring
fdz-sergio-jesus Nov 9, 2023
78a2f5d
Update examples structure
fdz-sergio-jesus Nov 13, 2023
6b09b37
Fix examples path
fdz-sergio-jesus Nov 13, 2023
6688cc8
Update utils related to results
fdz-sergio-jesus Nov 14, 2023
61ef351
Update import structure with isort
fdz-sergio-jesus Nov 14, 2023
94b0c56
Black reformat
fdz-sergio-jesus Nov 14, 2023
1902f59
add methods to utils init.
fdz-sergio-jesus Nov 14, 2023
3625762
Add pareto wrapper to init
fdz-sergio-jesus Nov 14, 2023
c4ddb05
rename arg
fdz-sergio-jesus Nov 14, 2023
6eb6ba6
fix method
fdz-sergio-jesus Nov 14, 2023
d727475
Remove iFrame from visualization
fdz-sergio-jesus Nov 16, 2023
a50dd45
Fix import
fdz-sergio-jesus Nov 16, 2023
a99a492
Change calls to only one class
fdz-sergio-jesus Nov 16, 2023
df5f6e5
Update plot init
fdz-sergio-jesus Nov 16, 2023
b355c79
Add display to plot
fdz-sergio-jesus Nov 16, 2023
d0597f6
add return to visualize method
fdz-sergio-jesus Nov 16, 2023
59c880b
Sort results by id
fdz-sergio-jesus Nov 17, 2023
a6b0bba
Add predictions to examples
fdz-sergio-jesus Nov 17, 2023
28f6eb4
Update results
fdz-sergio-jesus Nov 20, 2023
d746dbc
Change visualize of bootstrap plot
fdz-sergio-jesus Nov 20, 2023
0241db8
Add metrics to plot
fdz-sergio-jesus Nov 20, 2023
c6a6740
Add bias audit method
fdz-sergio-jesus Nov 22, 2023
ef31b0e
Sort results
fdz-sergio-jesus Nov 22, 2023
0be4fe9
fix results ordering
fdz-sergio-jesus Nov 22, 2023
bd86300
Cast id to str
fdz-sergio-jesus Nov 22, 2023
1c93096
Fix audit plot
fdz-sergio-jesus Nov 22, 2023
0aa4aed
Remove unnecessary prints
fdz-sergio-jesus Nov 22, 2023
cac08e1
Add pareto only check to visualize
fdz-sergio-jesus Nov 27, 2023
f10edb4
Update plots
fdz-sergio-jesus Nov 27, 2023
52f8ffc
add results for pareto models
fdz-sergio-jesus Nov 28, 2023
ac2cd38
Update researcher plot
fdz-sergio-jesus Nov 29, 2023
386d449
Update bootstraping input parameters
fdz-sergio-jesus Nov 29, 2023
b0dc6df
Add minimum value to n_models_to_sample
fdz-sergio-jesus Nov 29, 2023
a38a822
Remove unnecessary line
fdz-sergio-jesus Nov 29, 2023
509a09e
Improve the code performance and break logic in blocks
fdz-sergio-jesus Nov 29, 2023
1ff822f
Small corrections
fdz-sergio-jesus Nov 29, 2023
b1fcfef
Improved performance of bootstrap + bug solving
fdz-sergio-jesus Nov 30, 2023
1d1b70e
Correct formatting errors
fdz-sergio-jesus Nov 30, 2023
a28dabc
Include plotting to the group class (might be changed to the bias cla…
fdz-sergio-jesus Nov 30, 2023
d71e253
Add accuracy to group metrics
fdz-sergio-jesus Dec 4, 2023
72773eb
Add example dataset
fdz-sergio-jesus Dec 5, 2023
4f53665
Correct generic dataset
fdz-sergio-jesus Dec 5, 2023
804f206
Solve _data access
fdz-sergio-jesus Dec 5, 2023
60ff076
fix property type
fdz-sergio-jesus Dec 5, 2023
628c385
Update datasets examples
fdz-sergio-jesus Dec 5, 2023
538dee8
Make categorical features non-mandatory
fdz-sergio-jesus Dec 5, 2023
18e7df2
Add default_experiment to experiment subpackage
fdz-sergio-jesus Dec 6, 2023
7fabcae
Correct demographic parity metric
fdz-sergio-jesus Dec 6, 2023
33078cd
Add new header
fdz-sergio-jesus Dec 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ __pycache__/

*.DS_Store*

artifacts/

# Elastic Beanstalk Files
.elasticbeanstalk/*
Expand Down
Binary file added datasets/FolkTables/ACSEmployment.test.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added datasets/FolkTables/ACSIncome.test.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSIncome.train.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSIncome.validation.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSMobility.test.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSMobility.train.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added datasets/FolkTables/ACSTravelTime.test.parquet
Binary file not shown.
Binary file added datasets/FolkTables/ACSTravelTime.train.parquet
Binary file not shown.
Binary file not shown.
Empty file added examples/__init__.py
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsincome:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PINCP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSIncome (Sample)
15 changes: 15 additions & 0 deletions examples/configs/notebook_configs/experiment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
methods:
- lightgbm_baseline
- prevalence_undersampling
- fairgbm_folktables
- group_threshold_folktables

datasets:
- FolkTables_ACSIncome

optimization:
n_trials: 100
n_jobs: 1
sampler: RandomSampler
sampler_args:
seed: 42
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fairgbm_folktables:
type: inprocessing

defaults:
- fairgbm_folktables/preprocessing: identity
- fairgbm_folktables/inprocessing: fairgbm
- fairgbm_folktables/postprocessing: identity
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
fairgbm:
classpath: aequitas.fairflow.methods.inprocessing.fairgbm.FairGBM
args:
constraint_type:
- fnr

constraint_fnr_threshold:
- 0

proxy_margin:
- 1

multiplier_learning_rate:
type: float
range: [0.01, 1.0]
log: True

constraint_stepwise_proxy:
- cross_entropy

boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.postprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.preprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
group_threshold_folktables:
type: postprocessing

defaults:
- group_threshold_folktables/preprocessing: identity
- group_threshold_folktables/inprocessing: lightgbm
- group_threshold_folktables/postprocessing: group_threshold
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
lightgbm:
classpath: aequitas.fairflow.methods.inprocessing.lightgbm.LightGBM
args:
boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
group_thresholding:
classpath: aequitas.fairflow.methods.postprocessing.group_threshold.GroupThreshold
args:
threshold_type: tpr
threshold_value: 0.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.preprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
lightgbm_baseline:
type: inprocessing

defaults:
- lightgbm_baseline/preprocessing: identity
- lightgbm_baseline/inprocessing: lightgbm
- lightgbm_baseline/postprocessing: identity
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
lightgbm:
classpath: aequitas.fairflow.methods.inprocessing.lightgbm.LightGBM
args:
boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.postprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.preprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
prevalence_undersampling:
type: preprocessing

defaults:
- prevalence_undersampling/preprocessing: prevalence_sampling
- prevalence_undersampling/inprocessing: lightgbm
- prevalence_undersampling/postprocessing: identity
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
lightgbm:
classpath: aequitas.fairflow.methods.inprocessing.lightgbm.LightGBM
args:
boosting_type:
- dart # Running DART for all algos

enable_bundle:
- False

n_estimators:
type: int
range: [10, 100]

num_leaves:
type: int
range: [10, 1000]

min_child_samples:
type: int
range: [1, 500]
log: True

learning_rate:
type: float
range: [0.001, 0.1]

n_jobs:
- 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
identity:
classpath: aequitas.fairflow.methods.postprocessing.identity.Identity
args: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
prevalence_sampling:
args:
alpha:
type: float
range: [0.5, 1]
classpath: aequitas.fairflow.methods.preprocessing.prevalence_sample.PrevalenceSampling
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsemployment:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: ESR
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSEmployment
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsincome:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PINCP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSIncome
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsincome:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PINCP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSIncome
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acsmobility:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: MIG
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSMobility
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acspubliccoverage:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: PUBCOV
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSPublicCoverage
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folktables_acstraveltime:
classpath: aequitas.fairflow.datasets.FolkTables
sensitive_attribute: RAC1P
label: JWMNP
threshold:
threshold_type: fixed
threshold_value: 0.5
args:
variant: ACSTravelTime
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_base:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: Base
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_1:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeI
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_2:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeII
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_3:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeIII
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_4:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeIV
9 changes: 9 additions & 0 deletions examples/configs/paper_configs/datasets/baf_variant_5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
baf_variant_5:
classpath: aequitas.fairflow.datasets.BankAccountFraud
sensitive_attribute: customer_age_bin
label: fraud_bool
threshold:
threshold_type: fpr
threshold_value: 0.05
args:
variant: TypeV
23 changes: 23 additions & 0 deletions examples/configs/paper_configs/experiment_baf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
methods:
- lightgbm_baseline
- prevalence_undersampling
- prevalence_oversampling
- fairgbm_baf
- group_threshold_baf
- grid_search_baf
- exponentiated_gradient_baf

datasets:
- baf_base
- baf_variant_1
- baf_variant_2
- baf_variant_3
- baf_variant_4
- baf_variant_5

optimization:
n_trials: 100
n_jobs: 1
sampler: RandomSampler
sampler_args:
seed: 42
Loading