[python-package] LightGBM predict_proba()
corrupts pandas
categorical columns with unseen values
#6195
Labels
predict_proba()
corrupts pandas
categorical columns with unseen values
#6195
Description
In predict_proba of LGBMClassifier at least, if the input is a pandas dataframe, in a categorical column, when a value is not seen while fitting, entire column becomes corrupt.
Some might argue it's not important, but this behaviour is not documented, unexpected, and took me a lot of time to detect. It has lead to appearance of nulls out of nowhere in a chain of models making predictions on the same data. IMHO no model should change its inputs, (if there are performance reasons, still at least not without some special flag explicitly set?).
Reproducible example
Environment info
1.24.4 2.0.3 4.1.0
OS=Windows
Command(s) you used to install LightGBM
Additional Comments
The text was updated successfully, but these errors were encountered: