Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance of lambdarank using custom objective is poor when compared with in-built lambdarank #2239

Closed
ii-research-yu opened this issue Jun 24, 2019 · 12 comments
Labels

Comments

@ii-research-yu
Copy link

Based on the common used dataset, say MQ2008, the 5-fold CV performance of in-built lambdarank is:

nDCG@1:0.4319, nDCG@3:0.4574, nDCG@5:0.4953, nDCG@10:0.5844

By setting the hessian with constant values, say 1.0, the 5-fold CV performance of manually plug-in lambdarank via the fobj parameter is:

nDCG@1:0.3483, nDCG@3:0.4021, nDCG@5:0.4429, nDCG@10:0.5394

If using the manually computed hessian, the following issue is observed, namely the training can not conducted ...

[10] valid_0's ndcg@1: 0.113095 valid_0's ndcg@2: 0.161686 valid_0's ndcg@3: 0.202154 valid_0's ndcg@4: 0.214604 valid_0's ndcg@5: 0.23572
[20] valid_0's ndcg@1: 0.113095 valid_0's ndcg@2: 0.161686 valid_0's ndcg@3: 0.202154 valid_0's ndcg@4: 0.214604 valid_0's ndcg@5: 0.23572
[30] valid_0's ndcg@1: 0.113095 valid_0's ndcg@2: 0.161686 valid_0's ndcg@3: 0.202154 valid_0's ndcg@4: 0.214604 valid_0's ndcg@5: 0.23572

Environment info

Operating System: ubuntu 16.04

Python version: 3.7

LightGBM version or commit hash: 2.2.3

Any comments on the above results and the following questions are highly appreciated.
(1) why there is an obvious difference between in-built lambdarank and manual plug-in lambdarank? Is it due to the inner parameter setting?
(2) what are the tips when using manually computed hessian, which seems to be quite sensitive and vulnerable?

@guolinke
Copy link
Collaborator

I think it is related to your custom obj, could you provide it?
And what is "manually computed hessian", any differences with built-in hessian?

@ii-research-yu
Copy link
Author

@guolinke so many thanks for your quick response.
I just created a gist snippet https://gist.github.com/y-research-yu/ae9d32dabce52d33073278c2a13dabac

I made the comparison by commenting fobj or not to test the custom objective, e.g.,
params = {'boosting_type': 'gbdt', # gbdt, dart
'objective': 'lambdarank',
'metric': 'ndcg',
'learning_rate': 0.1, 'num_leaves': 255, 'num_trees': 500, 'num_threads': 16,
'min_data_in_leaf': 0,
#'min_sum_hessian_in_leaf': 1.0, #
'verbosity':-1}
gbm = lgb.train(params=params, train_set=train_set, valid_sets=[valid_set], verbose_eval=10,
#fobj=lightgbm_custom_obj_lambdarank
)
or
gbm = lgb.train(params=params, train_set=train_set, valid_sets=[valid_set], verbose_eval=10,
fobj=lightgbm_custom_obj_lambdarank
)

@Ian09
Copy link

Ian09 commented Aug 25, 2019

@y-research-yu hi, I have the same question with you. I have checked your gist and I think the gradient and hessian are correct. But the performance with the your implementation of lambdarank objective is very poor, actually, I the output is:

[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] training's ndcg@1: 0.402163 training's ndcg@2: 0.422151 training's ndcg@3: 0.442433 training's ndcg@4: 0.462206 training's ndcg@5: 0.483969
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2] training's ndcg@1: 0.393314 training's ndcg@2: 0.428606 training's ndcg@3: 0.448456 training's ndcg@4: 0.477228 training's ndcg@5: 0.501381
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3] training's ndcg@1: 0.418879 training's ndcg@2: 0.454627 training's ndcg@3: 0.473396 training's ndcg@4: 0.50212 training's ndcg@5: 0.523036
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4] training's ndcg@1: 0.420846 training's ndcg@2: 0.458612 training's ndcg@3: 0.476477 training's ndcg@4: 0.501127 training's ndcg@5: 0.521555
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5] training's ndcg@1: 0.432645 training's ndcg@2: 0.459553 training's ndcg@3: 0.480824 training's ndcg@4: 0.503187 training's ndcg@5: 0.522683
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[6] training's ndcg@1: 0.432645 training's ndcg@2: 0.459553 training's ndcg@3: 0.480824 training's ndcg@4: 0.503187 training's ndcg@5: 0.522683
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[7] training's ndcg@1: 0.432645 training's ndcg@2: 0.459553 training's ndcg@3: 0.480824 training's ndcg@4: 0.503187 training's ndcg@5: 0.522683
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[8] training's ndcg@1: 0.432645 training's ndcg@2: 0.459553 training's ndcg@3: 0.480824 training's ndcg@4: 0.503187 training's ndcg@5: 0.522683
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[9] training's ndcg@1: 0.432645 training's ndcg@2: 0.459553 training's ndcg@3: 0.480824 training's ndcg@4: 0.503187 training's ndcg@5: 0.522683
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[10] training's ndcg@1: 0.432645 training's ndcg@2: 0.459553 training's ndcg@3: 0.480824 training's ndcg@4: 0.503187 training's ndcg@5: 0.522683

Did you have any advance on this issue? Thanks very much

@ii-research-yu
Copy link
Author

@Ian09 Hi, thanks for the message. Unfortunately, I got no advance on this issue. @guolinke Any inspiring comments are highly appreciated, thanks!

@guolinke
Copy link
Collaborator

@y-research-yu
I think there are some bugs in your code.
you can check the grad/hess numerical issues, such as nan/inf/zeros.

For your questions, the built-in lambdarank only have a slightly difference with the original algorithm. you can check ndcg_norm parameter.

BTW, you can set 'verbosity':2 for more information.

@Ian09
Copy link

Ian09 commented Aug 28, 2019

Hi @guolinke and @y-research-yu
I have checked the hess and I found that the hess is very small. For example, in my data, there are 7903 items, and the mean, sum, min, max of the hess for the first epoch(tree) is -2.2476994045302423e-19,-1.7763568394002505e-15, -1.0848717237770573 6.157482893792464 and there is one items which is zero. Is it correct/normal for the hess?

BTW, I have checked @y-research-yu 's implementation, and I think the hess computation is correct, which is epsilon^2 * (sigmoid(s_{ij})*(1-sigmoid(s_{ij})

@guolinke
Copy link
Collaborator

guolinke commented Sep 4, 2019

@Ian09 thanks! the small hessian may cause the problem.
The outputs of tree leaves is sum_grad/sum_hess.

@y-research-yu
BTW, the hessian should be a positive number, since epsilon^2 * (sigmoid(s_{ij})*(1-sigmoid(s_{ij}) is the multiplication of 3 positive numbers. So I think there should be some bugs in the your code.

@guolinke
Copy link
Collaborator

guolinke commented Sep 4, 2019

@y-research-yu
I think the following code may be wrong:
https://gist.github.com/y-research-yu/ae9d32dabce52d33073278c2a13dabac#file-lightgbm_custom_obj_lambdarank-py-L83-L91

The hessian should be always accumulated, not need the lambda_ji_2order.
refer to

high_sum_lambda += p_lambda;
high_sum_hessian += p_hessian;
lambdas[low] -= static_cast<score_t>(p_lambda);
hessians[low] += static_cast<score_t>(p_hessian);
// lambda is negative, so use minus to accumulate
sum_lambdas -= 2 * p_lambda;
}
// update
lambdas[high] += static_cast<score_t>(high_sum_lambda);
hessians[high] += static_cast<score_t>(high_sum_hessian);

@akurennoy
Copy link

@guolinke
Hi, the fact that the hessian is always accumulated is a difference from the formulation in the original paper (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf), right? As far as I understand, the last formula on page 16 (Section 7 LambdaMART) means that the hessian is added when the label of the item is higher and subtracted when it's lower. Or am I missing something?

If I'm correct, is it just a heuristic or is there some reasoning behind the difference (i.e. always accumulating the hessian)?

Thanks!

@guolinke
Copy link
Collaborator

Thanks @akurennoy , As the learning is mainly depending on gradients. I think the hessian (like the weight) should be always positive.
For an intuitive example, for the low label doc with the highest score, its gradient will be negative, if its hessian is also negative (neg / neg = pos), the doc will not be pushed backward.
As for the equation in the paper, I will check it carefully.

@akurennoy
Copy link

Thank you very much for a quick reply. Makes sense. Apparently, designing the algorithm in full analogy with the Newton method wouldn't be a good idea here. (It's worth noting that the Newton method can "spoil" the gradient direction in classical optimization as well when the current point is far from a solution.)

@guolinke
Copy link
Collaborator

guolinke commented Oct 11, 2019

In the newton step for GBDT, the loss could be expanded by second-order Taylor expansion. Then the loss is a quadratic function, which reaches to the minimal point when x=-b/2a and a > 0. here a = sum_hessian / 2, b = sum_gradient.
Therefore, the sum_hessian should be always positive, at least constant, otherwise the newton step theory is broken.

also refer to https://xgboost.readthedocs.io/en/latest/tutorials/model.html#the-structure-score

@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants