Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imputation of numerical data #69

Closed
kopant opened this issue Mar 5, 2024 · 3 comments · Fixed by #79
Closed

Imputation of numerical data #69

kopant opened this issue Mar 5, 2024 · 3 comments · Fixed by #79
Labels
enhancement New feature or request

Comments

@kopant
Copy link

kopant commented Mar 5, 2024

Since you mentioned you're considering enhancing load_data(), I might also try to expose to the user different methods for imputation of missing numeric data. Currently in data_utils.load_num_feats() this defaults to median imputation, but this can be a poor choice if the reason the data is missing is due to real differences in the data generating process (ie, NULL data actually followed a different process than non-NULL data, and is meaningfully distinct from non-NULL data). In that case, one might instead want to encode the missing data with a distinct value from the non-NULL distribution prior to modeling.

@akashsaravanan-georgian akashsaravanan-georgian added the enhancement New feature or request label Mar 6, 2024
@akashsaravanan-georgian
Copy link
Contributor

That's a good idea, thanks! We'll incorporate that when doing the enhancement.

@akashsaravanan-georgian
Copy link
Contributor

Hi @kopant, happy to note that you can now do this by setting numerical_handle_na to True and modifying numerical_how_handle_na to either "mean", "median" or "value". If you want to use a specific value, you can set numerical_na_value.

@kopant
Copy link
Author

kopant commented Sep 18, 2024

Thanks for making the change, @akashsaravanan-georgian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants