Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only list bool as numerical if OpenML has it listed as numerical #556

Merged
merged 1 commit into from
Jun 26, 2023

Conversation

PGijsbers
Copy link
Collaborator

If the arff header contains a "categorical boolean" i.e. a nominal attribute with possible values {true, false} (or similar) then openml-python will convert it into bool when loading it into a dataframe. This in turn made AMLB write it to the split arff files as numeric, which could result in issues for frameworks relying on the split arff files produces by the benchmark (e.g., h2oautoml) especially when it was the target column. In the benchmark these are: kc1 (openml/t/3917), pc4 (openml/t/359958), and miniboone (openml/t/359990).

@PGijsbers PGijsbers merged commit 38fee5e into master Jun 26, 2023
@PGijsbers PGijsbers deleted the fix/bool_in_arff_header branch June 26, 2023 17:02
@PGijsbers PGijsbers added the bug Something isn't working label Jun 26, 2023
PGijsbers added a commit that referenced this pull request Jun 27, 2023
* Fix/inference with empty columns (#555)
* Only list bool as numerical if OpenML has it listed as numerical (#556)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant