Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranker train context and FastTree ranking xtensions #1068

Merged
merged 13 commits into from
Sep 29, 2018

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Sep 27, 2018

ongoing work to address #754 .
This PR adds the RankerContext, and the FastTree extension for the FastTreeRanker.

Still not complete because it fails to retrieve the MaxDcg metric.

@sfilipi sfilipi self-assigned this Sep 27, 2018
@sfilipi sfilipi added the API Issues pertaining the friendly API label Sep 27, 2018

Dcg = Fetch(RankerEvaluator.Dcg).Values;
Ndcg = Fetch(RankerEvaluator.Ndcg).Values;
// MaxDcg = Fetch(RankerEvaluator.MaxDcg).Values;
Copy link
Member Author

@sfilipi sfilipi Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// MaxDcg = Fetch(RankerEvaluator.MaxDcg).Values [](start = 16, length = 48)

Uncomment, and retrieve correctly. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from metrics, since it is a dataset characteristic.


In reply to: 220800890 [](ancestors = 220800890)

@@ -105,6 +105,48 @@ public static class FastTreeStatic
return rec.Output;
}

/// <summary>
/// FastTree <see cref="RankerContext"/>.
Copy link
Contributor

@TomFinley TomFinley Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTree . [](start = 12, length = 37)

Is this intended to be a placeholder? I'm not certain it's terribly helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears to still be here. I like the additional line you've added though.


In reply to: 221019258 [](ancestors = 221019258)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duh.. let me remove that first one.


In reply to: 221379761 [](ancestors = 221379761,221019258)

/// <param name="ctx">The <see cref="RegressionContext"/>.</param>
/// <param name="label">The label column.</param>
/// <param name="features">The features colum.</param>
/// <param name="groupId">The name of the groupId column.</param>
Copy link
Contributor

@TomFinley TomFinley Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the groupId column [](start = 34, length = 30)

It's not really a name is it? The point of pigsty is you don't use names, you just pass in objects. #Closed

/// <param name="label">The label column.</param>
/// <param name="features">The features colum.</param>
/// <param name="groupId">The name of the groupId column.</param>
/// <param name="weights">The weights column.</param>
Copy link
Contributor

@TomFinley TomFinley Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights [](start = 38, length = 7)

Maybe while you're at it, "the optional weights column" could make clear it's just fine to leave it null... though maybe people form this expectation already given that its default value is null. I don't know. #Closed

/// <param name="features">The features colum.</param>
/// <param name="groupId">The name of the groupId column.</param>
/// <param name="weights">The weights column.</param>
/// <param name="numLeaves">The number of leaves to use.</param>
Copy link
Contributor

@TomFinley TomFinley Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use [](start = 57, length = 6)

"Leaves to use" is kind of weird. Maybe "maximum number of leaves per decision tree" would be better. I might also prefer to have number of trees above number of leaves (that is, reverse the two args). #Closed

/// <param name="weights">The weights column.</param>
/// <param name="numLeaves">The number of leaves to use.</param>
/// <param name="numTrees">Total number of decision trees to create in the ensemble.</param>
/// <param name="minDocumentsInLeafs">The minimal number of documents allowed in a leaf of a regression tree, out of the subsampled data.</param>
Copy link
Contributor

@TomFinley TomFinley Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documents [](start = 68, length = 9)

The term documents is a little unfortunate. It is a legacy of this packages roots in ranking documents for the web ranker, but no-where else in ML.NET do we call the datapoints we have "documents" as far as I am aware. #Closed


Assert.InRange(metrics.Ndcg[0], 36.5, 37);
Assert.InRange(metrics.Ndcg[1], 36.5, 37);
Assert.InRange(metrics.Ndcg[2], 36.5, 37);
Copy link
Contributor

@TomFinley TomFinley Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. That's weird that they're all so close. Are we sure about this? They're not all identical, are they? If they are, that might be a sign that the group-id is different for each line.
#Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values are:
Dcg = [ 1.5972695, 1.770648, 1.79641082 ]
Ndcg = [ 36.90476, 36.7512, 36.8729 ]


In reply to: 221056237 [](ancestors = 221056237)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK! A bit weird but we'll go with that.


In reply to: 221145695 [](ancestors = 221145695,221056237)

…he docs site displays the methods per class, not file.

Addressing Tom's comments .
@sfilipi sfilipi changed the title WIP - Ranker train context and FastTree ranking xtensions Ranker train context and FastTree ranking xtensions Sep 28, 2018
@@ -16,7 +16,7 @@ namespace Microsoft.ML.Trainers
/// <summary>
/// LightGbm <see cref="TrainContextBase"/> extension methods.
Copy link
Member Author

@sfilipi sfilipi Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LightGbm extension metho [](start = 8, length = 55)

change #Resolved

@@ -59,6 +59,9 @@ public static class LightGbmStatics

return rec.Score;
}
}

public static partial class ClassificationTrainers {
Copy link
Member Author

@sfilipi sfilipi Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publi [](start = 4, length = 5)

doc. #Resolved

/// <summary>
/// <a href="https://en.wikipedia.org/wiki/Discounted_cumulative_gain">Discounted Cumulative gain</a>
/// is the sum of the gains, for all the instances i, normalized by the natural logarithm of the instance + 1.
/// Note that unline the Wikipedia article, ML.Net uses the natural logarithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlike

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 3cdd3c8 into dotnet:master Sep 29, 2018
@sfilipi sfilipi deleted the rankerTrainContext branch November 19, 2018 17:48
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants