-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance of lambdarank using custom objective is poor when compared with in-built lambdarank #2239
Comments
I think it is related to your custom obj, could you provide it? |
@guolinke so many thanks for your quick response. I made the comparison by commenting fobj or not to test the custom objective, e.g., |
@y-research-yu hi, I have the same question with you. I have checked your gist and I think the gradient and hessian are correct. But the performance with the your implementation of lambdarank objective is very poor, actually, I the output is:
Did you have any advance on this issue? Thanks very much |
@y-research-yu For your questions, the built-in lambdarank only have a slightly difference with the original algorithm. you can check BTW, you can set |
Hi @guolinke and @y-research-yu BTW, I have checked @y-research-yu 's implementation, and I think the hess computation is correct, which is |
@Ian09 thanks! the small hessian may cause the problem. @y-research-yu |
@y-research-yu The hessian should be always accumulated, not need the LightGBM/src/objective/rank_objective.hpp Lines 156 to 165 in df26b65
|
@guolinke If I'm correct, is it just a heuristic or is there some reasoning behind the difference (i.e. always accumulating the hessian)? Thanks! |
Thanks @akurennoy , As the learning is mainly depending on gradients. I think the hessian (like the weight) should be always positive. |
Thank you very much for a quick reply. Makes sense. Apparently, designing the algorithm in full analogy with the Newton method wouldn't be a good idea here. (It's worth noting that the Newton method can "spoil" the gradient direction in classical optimization as well when the current point is far from a solution.) |
In the newton step for GBDT, the loss could be expanded by second-order Taylor expansion. Then the loss is a quadratic function, which reaches to the minimal point when x=-b/2a and a > 0. here a = sum_hessian / 2, b = sum_gradient. also refer to https://xgboost.readthedocs.io/en/latest/tutorials/model.html#the-structure-score |
Based on the common used dataset, say MQ2008, the 5-fold CV performance of in-built lambdarank is:
nDCG@1:0.4319, nDCG@3:0.4574, nDCG@5:0.4953, nDCG@10:0.5844
By setting the hessian with constant values, say 1.0, the 5-fold CV performance of manually plug-in lambdarank via the fobj parameter is:
nDCG@1:0.3483, nDCG@3:0.4021, nDCG@5:0.4429, nDCG@10:0.5394
If using the manually computed hessian, the following issue is observed, namely the training can not conducted ...
[10] valid_0's ndcg@1: 0.113095 valid_0's ndcg@2: 0.161686 valid_0's ndcg@3: 0.202154 valid_0's ndcg@4: 0.214604 valid_0's ndcg@5: 0.23572
[20] valid_0's ndcg@1: 0.113095 valid_0's ndcg@2: 0.161686 valid_0's ndcg@3: 0.202154 valid_0's ndcg@4: 0.214604 valid_0's ndcg@5: 0.23572
[30] valid_0's ndcg@1: 0.113095 valid_0's ndcg@2: 0.161686 valid_0's ndcg@3: 0.202154 valid_0's ndcg@4: 0.214604 valid_0's ndcg@5: 0.23572
Environment info
Operating System: ubuntu 16.04
Python version: 3.7
LightGBM version or commit hash: 2.2.3
Any comments on the above results and the following questions are highly appreciated.
(1) why there is an obvious difference between in-built lambdarank and manual plug-in lambdarank? Is it due to the inner parameter setting?
(2) what are the tips when using manually computed hessian, which seems to be quite sensitive and vulnerable?
The text was updated successfully, but these errors were encountered: