-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction for the Learn-to-Rank Application #3326
Comments
Group is not needed in prediction time. The prediction results are pointwise, and you can rank them by yourself. |
Thank you for the quick answer ! Follow-up question (just by curiosity): would it be more efficient to run the prediction with the pairwise approach (as it is done for training)? Or is it not feasible? |
I think the current solution (listwise) is more efficient than pairwise. |
But the lambdarank implemented in LightGBM training module is pairwise and not listwise, correct? And independent from this question, why are we using pointwise approach instead of pairwise/listwise for prediction? |
@A4Ayou |
Thank you ! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hello,
I developed a learning-to-rank (LTR) model by setting the objective to
lambdarank
, and training the model using the.fit()
function with an LGB Data Set where I passed (1) the features data, (2) the label data, and (3) thegroup
of the query.To predict the results, I am using the
.predict()
function. But this function only takes as input the array of features, and do not consider the argumentgroup
to indicate the query.My question is then: when using prediction for an LTR model, should we pass data for one query at a time in the .predict() function, or can we directly pass a set of different queries?
For example, let's say I want to predict the ranking from 2 distinct queries A and B. I get my features for the documents in A
X(A)
and for the documents in BX(B)
. Should I run.predict()
separately onX(A)
andX(B)
or can I just concatenate bothX(A)
andX(B)
into a single numpy objectX
and runX
?My intuition would be to go with the first option, but it may also not be the most convenient. For my applications, I am dealing with thousands of queries. Does that mean that I have to run the prediction thousands of time to get the results?
This is a follow up on the issue #1398 which was closed (#1398)
Please let me know if something is unclear
Thanks!
Have a nice day!
The text was updated successfully, but these errors were encountered: