Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? #4679

Closed
Neronjust2017 opened this issue Oct 14, 2021 · 2 comments
Labels

Comments

@Neronjust2017
Copy link

Hi, I want to plot learning curves of train/valid/test loss and metric, according to the document, I should add a valid_sets = [lgb_train, lgb_valid, lgb_test]. However, I want to add early stopping on valid data only, and as said in the document, "The model will train until the validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of them". That's means if I add a valid_sets = [lgb_train, lgb_valid, lgb_test], the early stopping will check on all of them, which is not what I want. I wonder is there any way to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? Thanks!

The training API is in: https://lightgbm.cn/docs/8/#training-api

early_stopping_rounds (int or None__, optional (__default=None__)) – Activates early stopping. The model will train until the 

validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of 

them. If early stopping occurs, the model will add best_iteration field.



evals_result (dict or None__, optional (__default=None__)) – This dictionary used to store all evaluation results of all the items in 

valid_sets. Example With a valid_sets = [valid_set, train_set], valid_names = [‘eval’, ‘train’] and a params = (‘metric’:’logloss’) 

returns: {‘train’: {‘logloss’: [‘0.48253’, ‘0.35953’, …]}, ‘eval’: {‘logloss’: [‘0.480385’, ‘0.357756’, …]}}.
@jameslamb jameslamb changed the title How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? [python] How to log train/valid/test loss and metric in evals_results while keeping early stopping on valid data only? Nov 1, 2021
@jameslamb
Copy link
Collaborator

Hi @Neronjust2017 , thanks for your interest in LightGBM.

I'm not familiar with https://lightgbm.cn is, but it is not maintained by this project's maintainers and looks like it may not reflect the current state of this project.

In the official lightgbm docs on lgb.train() (this link), the documentation for early_stopping_rounds says the following.

Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored

Here is a place in the source code where the training data is given special treatment and ignored for the purposes of triggering early stopping:

if ((env.evaluation_result_list[i][0] == "cv_agg" and eval_name_splitted[0] == "train"
or env.evaluation_result_list[i][0] == env.model._train_data_name)):
_final_iteration_check(env, eval_name_splitted, i)
continue # train data for lgb.cv or sklearn wrapper (underlying lgb.train)


To demonstrate this behavior, I created the following reproducible example tonight. This uses lightgbm==3.3.1 and Python 3.8.8.

It uses a custom evaluation metric, just to force the situation where a metric fails to improve on the training dataset. It also sets the name of the training data in valid_names to "sparkly-unicorn", just to prove that there's not special logic that requires you to name the training data "train".

import numpy as np
import pandas as pd
import lightgbm as lgb

from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=10_000, n_features=10, n_informative=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

dtrain = lgb.Dataset(data=X_train, label=y_train)
dtest = lgb.Dataset(data=X_test, label=y_test, reference=dtrain)

def _never_improve_on_train_data(preds, labeled_data):
    name = "never_improve_on_train_data"
    higher_better = False
    if labeled_data.num_data() == 9000:
        value = -5.0
    else:
        value = mean_squared_error(labeled_data.get_label(), preds)
    return name, value, higher_better

evals_result = {}

bst = lgb.train(
    train_set=dtrain,
    params={
        "early_stopping_rounds": 2,
        "objective": "regression_l2",
        "metric": "None",
        "num_iterations": 10,
        "num_leaves": 8,
        "verbose": 1
    },
    valid_sets=[dtrain, dtest],
    valid_names=["sparkly-unicorn", "test1"],
    evals_result=evals_result,
    feval=_never_improve_on_train_data
)

This produces the following logs, which show that lightgbm is not considering the training data for early stopping.

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002276 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2550
[LightGBM] [Info] Number of data points in the train set: 9000, number of used features: 10
[LightGBM] [Info] Start training from score 0.500779
[1]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 17751.1
Training until validation scores don't improve for 2 rounds
[2]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 15970.8
[3]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 14442.7
[4]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 13150.7
[5]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 11987.7
[6]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 11026.2
[7]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 10152.1
[8]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 9284.53
[9]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 8563.64
[10]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 7963.8
Did not meet early stopping. Best iteration is:
[1]	sparkly-unicorn's never_improve_on_train_data: -5	test1's never_improve_on_train_data: 17751.1

As you can see in these logs, the only metric considered was never_improve_on_train_data. Training did not hit early stopping, despite setting early_stopping_round=2 and that metric never improving on the training data.


Have you observed a situation where lightgbm is performing early stopping based on evaluation against the training data? If so, could you provide a reproducible example that shows that behavior?

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants