[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679

Neronjust2017 · 2021-10-14T07:48:56Z

Hi, I want to plot learning curves of train/valid/test loss and metric, according to the document, I should add a valid_sets = [lgb_train, lgb_valid, lgb_test]. However, I want to add early stopping on valid data only, and as said in the document, "The model will train until the validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of them". That's means if I add a valid_sets = [lgb_train, lgb_valid, lgb_test], the early stopping will check on all of them, which is not what I want. I wonder is there any way to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? Thanks!

The training API is in: https://lightgbm.cn/docs/8/#training-api

early_stopping_rounds (int or None__, optional (__default=None__)) – Activates early stopping. The model will train until the 

validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of 

them. If early stopping occurs, the model will add best_iteration field.



evals_result (dict or None__, optional (__default=None__)) – This dictionary used to store all evaluation results of all the items in 

valid_sets. Example With a valid_sets = [valid_set, train_set], valid_names = [‘eval’, ‘train’] and a params = (‘metric’:’logloss’) 

returns: {‘train’: {‘logloss’: [‘0.48253’, ‘0.35953’, …]}, ‘eval’: {‘logloss’: [‘0.480385’, ‘0.357756’, …]}}.

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-11-01T02:20:07Z

Hi @Neronjust2017 , thanks for your interest in LightGBM.

I'm not familiar with https://lightgbm.cn is, but it is not maintained by this project's maintainers and looks like it may not reflect the current state of this project.

In the official lightgbm docs on lgb.train() (this link), the documentation for early_stopping_rounds says the following.

Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored

Here is a place in the source code where the training data is given special treatment and ignored for the purposes of triggering early stopping:

LightGBM/python-package/lightgbm/callback.py

Lines 269 to 272 in da98f24

    
           if ((env.evaluation_result_list[i][0] == "cv_agg" and eval_name_splitted[0] == "train" 
        
                or env.evaluation_result_list[i][0] == env.model._train_data_name)): 
        
               _final_iteration_check(env, eval_name_splitted, i) 
        
               continue  # train data for lgb.cv or sklearn wrapper (underlying lgb.train)

To demonstrate this behavior, I created the following reproducible example tonight. This uses lightgbm==3.3.1 and Python 3.8.8.

It uses a custom evaluation metric, just to force the situation where a metric fails to improve on the training dataset. It also sets the name of the training data in valid_names to "sparkly-unicorn", just to prove that there's not special logic that requires you to name the training data "train".

import numpy as np
import pandas as pd
import lightgbm as lgb

from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=10_000, n_features=10, n_informative=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

dtrain = lgb.Dataset(data=X_train, label=y_train)
dtest = lgb.Dataset(data=X_test, label=y_test, reference=dtrain)

def _never_improve_on_train_data(preds, labeled_data):
    name = "never_improve_on_train_data"
    higher_better = False
    if labeled_data.num_data() == 9000:
        value = -5.0
    else:
        value = mean_squared_error(labeled_data.get_label(), preds)
    return name, value, higher_better

evals_result = {}

bst = lgb.train(
    train_set=dtrain,
    params={
        "early_stopping_rounds": 2,
        "objective": "regression_l2",
        "metric": "None",
        "num_iterations": 10,
        "num_leaves": 8,
        "verbose": 1
    },
    valid_sets=[dtrain, dtest],
    valid_names=["sparkly-unicorn", "test1"],
    evals_result=evals_result,
    feval=_never_improve_on_train_data
)

This produces the following logs, which show that lightgbm is not considering the training data for early stopping.

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002276 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2550
[LightGBM] [Info] Number of data points in the train set: 9000, number of used features: 10
[LightGBM] [Info] Start training from score 0.500779
[1]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 17751.1
Training until validation scores don't improve for 2 rounds
[2]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 15970.8
[3]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 14442.7
[4]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 13150.7
[5]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 11987.7
[6]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 11026.2
[7]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 10152.1
[8]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 9284.53
[9]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 8563.64
[10]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 7963.8
Did not meet early stopping. Best iteration is:
[1]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 17751.1

As you can see in these logs, the only metric considered was never_improve_on_train_data. Training did not hit early stopping, despite setting early_stopping_round=2 and that metric never improving on the training data.

Have you observed a situation where lightgbm is performing early stopping based on evaluation against the training data? If so, could you provide a reproducible example that shows that behavior?

github-actions · 2023-08-23T14:08:15Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added the question label Nov 1, 2021

jameslamb changed the title ~~How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only?~~ [python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? Nov 1, 2021

jameslamb added the awaiting response label Nov 1, 2021

jameslamb mentioned this issue Nov 1, 2021

[docs] [python] custom metric function in lgb.train() interface should not refer to the the passed dataset a train_data #4759

Closed

Neronjust2017 closed this as completed Nov 28, 2021

jameslamb mentioned this issue Nov 28, 2021

Questions regarding early_stopping_rounds & fitted model best_iteration #4834

Closed

jameslamb removed the awaiting response label Nov 29, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679

[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679

Neronjust2017 commented Oct 14, 2021

jameslamb commented Nov 1, 2021

github-actions bot commented Aug 23, 2023

[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679

[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679

Comments

Neronjust2017 commented Oct 14, 2021

jameslamb commented Nov 1, 2021

github-actions bot commented Aug 23, 2023