-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] LightGBM predict_proba()
corrupts pandas
categorical columns with unseen values
#6195
Comments
predict_proba()
corrupts pandas
categorical columns with unseen values
Hey @fingoldo, thanks for using LightGBM and sorry for the troubles. We used to take a shallow copy there but it wasn't obvious that the predict step depended on that and a recent refactor removed it. We'll work on a fix. |
Thank you so much Jose, that's what I call a fast turnaround! ;-) For now I just pass a .copy() of a dataframe to LightGBM, then other models of the ensemble are not affected. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
In predict_proba of LGBMClassifier at least, if the input is a pandas dataframe, in a categorical column, when a value is not seen while fitting, entire column becomes corrupt.
Some might argue it's not important, but this behaviour is not documented, unexpected, and took me a lot of time to detect. It has lead to appearance of nulls out of nowhere in a chain of models making predictions on the same data. IMHO no model should change its inputs, (if there are performance reasons, still at least not without some special flag explicitly set?).
Reproducible example
Environment info
1.24.4 2.0.3 4.1.0
OS=Windows
Command(s) you used to install LightGBM
Additional Comments
The text was updated successfully, but these errors were encountered: