-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679
Comments
Hi @Neronjust2017 , thanks for your interest in LightGBM. I'm not familiar with https://lightgbm.cn is, but it is not maintained by this project's maintainers and looks like it may not reflect the current state of this project. In the official
Here is a place in the source code where the training data is given special treatment and ignored for the purposes of triggering early stopping: LightGBM/python-package/lightgbm/callback.py Lines 269 to 272 in da98f24
To demonstrate this behavior, I created the following reproducible example tonight. This uses It uses a custom evaluation metric, just to force the situation where a metric fails to improve on the training dataset. It also sets the name of the training data in import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=10_000, n_features=10, n_informative=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
dtrain = lgb.Dataset(data=X_train, label=y_train)
dtest = lgb.Dataset(data=X_test, label=y_test, reference=dtrain)
def _never_improve_on_train_data(preds, labeled_data):
name = "never_improve_on_train_data"
higher_better = False
if labeled_data.num_data() == 9000:
value = -5.0
else:
value = mean_squared_error(labeled_data.get_label(), preds)
return name, value, higher_better
evals_result = {}
bst = lgb.train(
train_set=dtrain,
params={
"early_stopping_rounds": 2,
"objective": "regression_l2",
"metric": "None",
"num_iterations": 10,
"num_leaves": 8,
"verbose": 1
},
valid_sets=[dtrain, dtest],
valid_names=["sparkly-unicorn", "test1"],
evals_result=evals_result,
feval=_never_improve_on_train_data
) This produces the following logs, which show that
As you can see in these logs, the only metric considered was Have you observed a situation where |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hi, I want to plot learning curves of train/valid/test loss and metric, according to the document, I should add a valid_sets = [lgb_train, lgb_valid, lgb_test]. However, I want to add early stopping on valid data only, and as said in the document, "The model will train until the validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of them". That's means if I add a valid_sets = [lgb_train, lgb_valid, lgb_test], the early stopping will check on all of them, which is not what I want. I wonder is there any way to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? Thanks!
The training API is in: https://lightgbm.cn/docs/8/#training-api
The text was updated successfully, but these errors were encountered: