-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ranker train context and FastTree ranking xtensions #1068
Conversation
…r, and an Evaluate method + metrics class to the existing RankerEvaluator.
fixing the comments. adding checks to the test.
|
||
Dcg = Fetch(RankerEvaluator.Dcg).Values; | ||
Ndcg = Fetch(RankerEvaluator.Ndcg).Values; | ||
// MaxDcg = Fetch(RankerEvaluator.MaxDcg).Values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// MaxDcg = Fetch(RankerEvaluator.MaxDcg).Values [](start = 16, length = 48)
Uncomment, and retrieve correctly. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed from metrics, since it is a dataset characteristic.
In reply to: 220800890 [](ancestors = 220800890)
@@ -105,6 +105,48 @@ public static class FastTreeStatic | |||
return rec.Output; | |||
} | |||
|
|||
/// <summary> | |||
/// FastTree <see cref="RankerContext"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTree . [](start = 12, length = 37)
Is this intended to be a placeholder? I'm not certain it's terribly helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line appears to still be here. I like the additional line you've added though.
In reply to: 221019258 [](ancestors = 221019258)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="ctx">The <see cref="RegressionContext"/>.</param> | ||
/// <param name="label">The label column.</param> | ||
/// <param name="features">The features colum.</param> | ||
/// <param name="groupId">The name of the groupId column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the groupId column [](start = 34, length = 30)
It's not really a name is it? The point of pigsty is you don't use names, you just pass in objects. #Closed
/// <param name="label">The label column.</param> | ||
/// <param name="features">The features colum.</param> | ||
/// <param name="groupId">The name of the groupId column.</param> | ||
/// <param name="weights">The weights column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weights [](start = 38, length = 7)
Maybe while you're at it, "the optional weights column" could make clear it's just fine to leave it null
... though maybe people form this expectation already given that its default value is null
. I don't know. #Closed
/// <param name="features">The features colum.</param> | ||
/// <param name="groupId">The name of the groupId column.</param> | ||
/// <param name="weights">The weights column.</param> | ||
/// <param name="numLeaves">The number of leaves to use.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to use [](start = 57, length = 6)
"Leaves to use" is kind of weird. Maybe "maximum number of leaves per decision tree" would be better. I might also prefer to have number of trees above number of leaves (that is, reverse the two args). #Closed
/// <param name="weights">The weights column.</param> | ||
/// <param name="numLeaves">The number of leaves to use.</param> | ||
/// <param name="numTrees">Total number of decision trees to create in the ensemble.</param> | ||
/// <param name="minDocumentsInLeafs">The minimal number of documents allowed in a leaf of a regression tree, out of the subsampled data.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documents [](start = 68, length = 9)
The term documents is a little unfortunate. It is a legacy of this packages roots in ranking documents for the web ranker, but no-where else in ML.NET do we call the datapoints we have "documents" as far as I am aware. #Closed
|
||
Assert.InRange(metrics.Ndcg[0], 36.5, 37); | ||
Assert.InRange(metrics.Ndcg[1], 36.5, 37); | ||
Assert.InRange(metrics.Ndcg[2], 36.5, 37); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. That's weird that they're all so close. Are we sure about this? They're not all identical, are they? If they are, that might be a sign that the group-id is different for each line.
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values are:
Dcg = [ 1.5972695, 1.770648, 1.79641082 ]
Ndcg = [ 36.90476, 36.7512, 36.8729 ]
In reply to: 221056237 [](ancestors = 221056237)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK! A bit weird but we'll go with that.
In reply to: 221145695 [](ancestors = 221145695,221056237)
…he docs site displays the methods per class, not file. Addressing Tom's comments .
@@ -16,7 +16,7 @@ namespace Microsoft.ML.Trainers | |||
/// <summary> | |||
/// LightGbm <see cref="TrainContextBase"/> extension methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LightGbm extension metho [](start = 8, length = 55)
change #Resolved
@@ -59,6 +59,9 @@ public static class LightGbmStatics | |||
|
|||
return rec.Score; | |||
} | |||
} | |||
|
|||
public static partial class ClassificationTrainers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
publi [](start = 4, length = 5)
doc. #Resolved
/// <summary> | ||
/// <a href="https://en.wikipedia.org/wiki/Discounted_cumulative_gain">Discounted Cumulative gain</a> | ||
/// is the sum of the gains, for all the instances i, normalized by the natural logarithm of the instance + 1. | ||
/// Note that unline the Wikipedia article, ML.Net uses the natural logarithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ongoing work to address #754 .
This PR adds the RankerContext, and the FastTree extension for the FastTreeRanker.
Still not complete because it fails to retrieve the MaxDcg metric.