Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download LawSchool dataset directly from SEAPHE #359

Closed
hoffmansc opened this issue Aug 29, 2022 · 3 comments · Fixed by #510
Closed

Download LawSchool dataset directly from SEAPHE #359

hoffmansc opened this issue Aug 29, 2022 · 3 comments · Fixed by #510
Assignees
Labels
datasets Issue relating to new or existing datasets easy Beginner issues good first issue Good for newcomers

Comments

@hoffmansc
Copy link
Collaborator

hoffmansc commented Aug 29, 2022

http://www.seaphe.org/databases.php

This way we can remove the dependency on tempeh. We can essentially copy this file (preserving the copyright notice): https://github.com/microsoft/tempeh/blob/main/tempeh/datasets/seaphe_datasets.py

See also meps_datasets.py for another example of downloading/unzipping.

Relevant files:
tempeh_datasets.py
law_school_gpa_dataset.py

See demo_grid_search_reduction_regression_sklearn.ipynb for example usage.

Behavior should be essentially the same as tempeh except dropping of NAs can be handled later so these should be kept.

@hoffmansc hoffmansc added good first issue Good for newcomers datasets Issue relating to new or existing datasets labels Aug 29, 2022
@nrkarthikeyan nrkarthikeyan added medium Intermediate skill level may be needed easy Beginner issues datasets Issue relating to new or existing datasets and removed datasets Issue relating to new or existing datasets medium Intermediate skill level may be needed labels Sep 15, 2022
@anupamamurthi
Copy link
Collaborator

anupamamurthi commented Sep 15, 2022

Possible Tasks:

  • Ensure the license permits open source us
  • Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.)
  • Ensure we have instance level records with protected attributes and outcomes
  • First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority)
  • Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
  • DO NOT download and incorporate the data, rather include a function that will do this since data is not hosted in AIF360.

@EktaBhaskar
Copy link

please assign me this issue.

@vandanapathare
Copy link

Can I get this issue assigned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Issue relating to new or existing datasets easy Beginner issues good first issue Good for newcomers
Projects
None yet
5 participants