-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When header=0, LGBM_BoosterPredictForFile() called on a CSV with column names raises a process-crashing error #5093
Comments
Great write-up, thanks! |
I found tonight that this issue is not limited to the R package. As of latest import os
import lightgbm as lgb
import pandas as pd
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=1_000, random_state=708)
dtrain = lgb.Dataset(data=X, label=y)
bst = lgb.train(
train_set=dtrain,
params={
"objective": "regression"
},
num_boost_round=5
)
# write to CSV
X_df = pd.DataFrame(
X,
columns=[f"col_{i}" for i in range(X.shape[1])]
)
csv_file = os.path.join(os.getcwd(), "test.csv")
X_df.to_csv(csv_file, header=True, index=False)
# predict
bst.predict(data=csv_file) I ran this example in Jupyter Lab, and saw the following in its logs.
If I change the bst.predict(data=csv_file, data_has_header=True) predicting succeeds. With the R package, I also found that predicting on a CSV with column names in the headers succeeds if I pass library(lightgbm)
data(mtcars)
X <- as.matrix(mtcars[, -1L])
y <- as.numeric(mtcars[, 1L])
dtrain <- lgb.Dataset(
X
, label = y
, params = list(min_data_in_bin = 1L, min_data_in_leaf = 1L)
)
bst <- lgb.train(
data = dtrain
, obj = "regression"
, nrounds = 5L
)
fname <- tempfile(fileext=".csv")
write.csv(X, fname, row.names=FALSE)
# using header = TRUE, predicting succeeds
pred <- predict(bst, fname, header = TRUE) So I think there are two separate issues:
|
For anyone looking to contribute a fix, the error being thrown
is because of this call to LightGBM/include/LightGBM/utils/common.h Line 346 in b0137de
which throws an error here: LightGBM/include/LightGBM/utils/log.h Line 130 in 0a4851f
|
This will crash the R process from which lightgbm is called:
That C++ exception should be caught and thrown as an R error instead.
ref #4977
The text was updated successfully, but these errors were encountered: