Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Context to create them all #1252

Merged
merged 9 commits into from
Oct 18, 2018
Merged

Conversation

Zruty0
Copy link
Contributor

@Zruty0 Zruty0 commented Oct 13, 2018

Fixes #1098 .
Adds a couple extensions for transforms, and almost all trainers.
Adds text loading and saving, model loading.

@Zruty0 Zruty0 added the API Issues pertaining the friendly API label Oct 13, 2018
@Zruty0 Zruty0 added this to the 1018 milestone Oct 13, 2018
@Zruty0 Zruty0 self-assigned this Oct 13, 2018
@sfilipi
Copy link
Member

sfilipi commented Oct 13, 2018

using System;

copyright #Resolved


Refers to: src/Microsoft.ML.Data/DataLoadSave/DataLoadSaveCatalog.cs:1 in 26913ac. [](commit_id = 26913ac, deletion_comment = False)

/// </summary>
/// <param name="catalog">The catalog.</param>
/// <param name="args">The arguments to text reader, describing the data schema.</param>
/// <param name="dataSample">The optional data sample</param>
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data sample, or dataSource?
#Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a sample


In reply to: 224948938 [](ancestors = 224948938)

/// <param name="catalog">The catalog.</param>
/// <param name="args">The arguments to text reader, describing the data schema.</param>
/// <param name="dataSample">The optional data sample</param>
/// <returns></returns>
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 12, length = 19)

empty returns #Resolved

public void Save(ITransformer model, Stream stream) => model.SaveTo(Environment, stream);

/// <summary>
/// Load the model from the stream.
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model [](start = 21, length = 5)

If we are saving and loading an ITransformer, should we call the class TransformerOperationsCatalog? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name should reflect the user-facing concept, rather than the implementation fact that 'trained model is a transformer'.


In reply to: 224950106 [](ancestors = 224950106)

@@ -403,7 +372,7 @@ internal MulticlassClassificationTrainers(MulticlassClassificationContext ctx)
/// the top-K accuracy, that is, the accuracy assuming we consider an example with the correct class within
/// the top-K values as being stored "correctly."</param>
/// <returns>The evaluation results for these calibrated outputs.</returns>
public MultiClassClassifierEvaluator.Result Evaluate(IDataView data, string label, string score = DefaultColumnNames.Score,
public MultiClassClassifierEvaluator.Result Evaluate(IDataView data, string label = DefaultColumnNames.Label, string score = DefaultColumnNames.Score,
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string label = DefaultColumnNames.Label [](start = 77, length = 39)

would assigning defaults here make it error prone. This allows the users to trip and provide a name for the label column in the estimators, and leave it to the default name here? We have not assigned defaults in the estimators. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, all the trainer extensions also have default names for Label and Features. This is why I made this change as well.


In reply to: 224950306 [](ancestors = 224950306)

/// </summary>
/// <param name="catalog">The transform catalog</param>
/// <param name="columnName">The column name</param>
/// <param name="mode">The normalization mode</param>
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalization mod [](start = 31, length = 21)

cref maybe #Resolved

namespace Microsoft.ML.Runtime
{
/// <summary>
/// Similar to training context, a transform context is an object serving as a 'catalog' of available transforms.
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like catalog. #Resolved

/// <param name="weights">The optional example weights.</param>
/// <param name="clustersCount">The number of clusters to use for KMeans.</param>
/// <param name="advancedSettings">Algorithm advanced settings.</param>
public static KMeansPlusPlusTrainer KMeans(this ClusteringContext.ClusteringTrainers ctx,
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KMeansPlusPlusTrainer [](start = 22, length = 21)

shall we add returns to all those new members, to distinguish them from the other extensions in the documentation? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not sure why


In reply to: 224950512 [](ancestors = 224950512)

using Microsoft.ML.StaticPipe.Runtime;
using System;

namespace Microsoft.ML.Trainers
namespace Microsoft.ML.StaticPipe
Copy link
Member

@sfilipi sfilipi Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StaticPipe [](start = 23, length = 10)

why change namespace? I believe we decided to keep the trainers inside Microsoft.ML.Trainers.
The methods that return trainers iside the new catalog classes are being placed under Microsoft.ML.

@GalOshri what do you think? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this sweeping change, because otherwise we are seeing 2 methods per trainer in the intellisense by default.

I think for now we want users to 'opt in' to the static API by using StaticPipe, and have the dynamic API available right off the bat


In reply to: 224950522 [](ancestors = 224950522)

/// <param name="conc">Concurrency level. Set to 1 to run single-threaded. Set to 0 to pick automatically.</param>
public MLContext(int? seed = null, int conc = 0)
{
_env = new LocalEnvironment(seed, conc);
Copy link
Member

@eerhardt eerhardt Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have assumed that we just rename LocalEnvironment to MLContext. Why do we have both? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me think this through.


In reply to: 225250207 [](ancestors = 225250207)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably not bundle this change right now, but I'll write a REVIEW and a GitHib issue


In reply to: 225262172 [](ancestors = 225262172,225250207)

@eerhardt
Copy link
Member

eerhardt commented Oct 15, 2018

// Licensed to the .NET Foundation under one or more agreements.

Why is this file under the Training folder? #Resolved


Refers to: src/Microsoft.ML.Data/Training/MLContext.cs:1 in 26913ac. [](commit_id = 26913ac, deletion_comment = False)

/// <summary>
/// Data processing operations.
/// </summary>
public TransformsCatalog Transform { get; }
Copy link
Member

@eerhardt eerhardt Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the other properties are not plural, but for some reason the singular Transform sticks out to me. It feels like it should be Transforms. Even the class name TransformsCatalog has plural. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I like it the way it is. Maybe other commenters can swing the vote?


In reply to: 225254551 [](ancestors = 225254551)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I renamed it


In reply to: 225262070 [](ancestors = 225262070,225254551)

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 15, 2018

// Licensed to the .NET Foundation under one or more agreements.

It has to be in one of the folders :) It is mostly used for composing pipelines, so I picked Training. Do you have a better idea?


In reply to: 429941509 [](ancestors = 429941509)


Refers to: src/Microsoft.ML.Data/Training/MLContext.cs:1 in 26913ac. [](commit_id = 26913ac, deletion_comment = False)

@eerhardt
Copy link
Member

// Licensed to the .NET Foundation under one or more agreements.

What's wrong with the folder: https://github.com/dotnet/machinelearning/tree/master/src/Microsoft.ML.Data ?

In general, the .NET Core team follows folders-namespace alignment, so it is easier to find files. Since this type is in the root of the namespace: Microsoft.ML, I believe it should be in the root of the folders.


In reply to: 429953480 [](ancestors = 429953480,429941509)


Refers to: src/Microsoft.ML.Data/Training/MLContext.cs:1 in 26913ac. [](commit_id = 26913ac, deletion_comment = False)

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 15, 2018

// Licensed to the .NET Foundation under one or more agreements.

I put it in the root.


In reply to: 429967563 [](ancestors = 429967563,429953480,429941509)


Refers to: src/Microsoft.ML.Data/Training/MLContext.cs:1 in 26913ac. [](commit_id = 26913ac, deletion_comment = False)

@@ -293,11 +288,11 @@ In the file above, the last column (12th) is label that we predict, and all the
```csharp
// Create a new environment for ML.NET operations. It can be used for exception tracking and logging,
Copy link
Contributor

@artidoro artidoro Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment [](start = 16, length = 11)

Would be a good idea to replace with the above comments:

// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness. #Resolved

@@ -449,13 +437,13 @@ The prediction code now looks as follows:
```csharp
// Create a new environment for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
Copy link
Contributor

@artidoro artidoro Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, would be a good idea to replace with the above comments:

// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness. #Resolved

@@ -797,10 +776,10 @@ Sentiment SentimentText
```csharp
// Create a new environment for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
Copy link
Contributor

@artidoro artidoro Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a good idea to replace with the above comments:

// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness. #Resolved

@@ -345,7 +337,7 @@ Assuming the example above was used to train the model, here's how you calculate
var testData = reader.Read(new MultiFileSource(testDataPath));
// Calculate metrics of the model on the test data.
// We are using the 'regression' context object here to perform evaluation.
var metrics = regression.Evaluate(model.Transform(testData), label: r => r.Target, score: r => r.Prediction);
var metrics = mlContext.Regression.Evaluate(model.Transform(testData), label: r => r.Target, score: r => r.Prediction);
Copy link
Contributor

@artidoro artidoro Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sentence still correct? #Resolved

/// </summary>
public sealed class MLContext : IHostEnvironment
{
// REVIEW: consider making LocalEnvironment and MLContext the same class instead of encapsulation.
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't fully understand our plan here with LocalEnvironment and ConsoleEnvironment. How could I ever use a ConsoleEnvironment with this new MLContext class?

And if the answer is "you can't - you'd use ConsoleEnvironment directly", I think that is a pretty poor answer because then my whole API paradigm changes. I can no longer use BinaryClassificationContext, Transforms, etc. from the MLContext. #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the answer: "you can't use ConsoleEnvironment because it is internal and only used by MAML"?


In reply to: 225986842 [](ancestors = 225986842)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fine with me. For now, I would think it's completely fine to have our custom console printing internal to the commandline tool


In reply to: 225989180 [](ancestors = 225989180,225986842)

Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1284 #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


In reply to: 226052093 [](ancestors = 226052093)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


using (var stream = File.Create(modelPath))
{
// Saving and loading happens to 'dynamic' models, so the static typing is lost in the process.
model.AsDynamic.SaveTo(env, stream);
mlContext.Model.Save(model.AsDynamic, stream);
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this so I don't have to say model.AsDynamic? Can we open an issue to add that overload? #Pending

Copy link
Contributor Author

@Zruty0 Zruty0 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely: #1286


In reply to: 226105313 [](ancestors = 226105313)

new TextLoader.Column("Target", DataKind.R4, 10)
},
// Default separator is tab, but we need a comma.
s => s.Separator = ",");
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is probably not exactly related to MLContext, but since you wrote this new test, I'll ask the question):

Do we like having s => here? Why isn't the separator just an argument in the TextReader() method? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern we want to have is that we add the most important options as ctor options, and the rest 'advanced' inside the delegate.
The other pattern is that we want the extensions to replicate the signatures of constructors, to avoid any mapping logic in strange places.
Because of that, I'd rather change the signature of TextLoader ctor to add 'hasHeader' and 'separator', but we will need to do it in a separate PR.

Also 'separator' should not really be a string :)


In reply to: 226106086 [](ancestors = 226106086)

@@ -128,7 +128,7 @@ public Arguments()
public static IDataView Create(IHostEnvironment env, IDataView input, string name,
string source = null, OutputKind outputKind = CategoricalEstimator.Defaults.OutKind)
{
return new CategoricalEstimator(env, name, source, outputKind).Fit(input).Transform(input) as IDataView;
return new CategoricalEstimator(env, source, name, outputKind).Fit(input).Transform(input) as IDataView;
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider renaming and re-ordering the parameters to this method as well. name and source don't really help say what they are. Also, if new CategoricalEstimator has it one way, and CategoricalTransform.Create has it the other way, that is going to lead to confusion. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure


In reply to: 226108453 [](ancestors = 226108453)

using System;
using System.Collections.Generic;

namespace Microsoft.ML
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Since this is in the Microsoft.ML namespace, my preference would be for the file to not be under the Text folder. I think namespaces and folders should align. #Resolved

/// <summary>
/// FastTree <see cref="TrainContextBase"/> extension methods.
/// </summary>
public static partial class RegressionTrainers
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is a Microsoft.ML.RegressionTrainers class in the StandardLearners and LightGBM assemblies. We shouldn't have public types with the same namespace and name across multiple assemblies. It is only going to cause headaches for our users. #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to all the other extension classes as well.


In reply to: 226109180 [](ancestors = 226109180)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was intentionally done by Senja for some documentation reasons?


In reply to: 226109272 [](ancestors = 226109272,226109180)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me keep it as is for now, and I'll sync with her and understand why it was done. For me, it would've been much easier to group by trainer, into FastTreeTrainerCatalog or something


In reply to: 226109747 [](ancestors = 226109747,226109272,226109180)

Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me keep it as is for now

I don't know .... this is a pretty strong "don't" rule in general. If we are going to fix this very soon (like the next few days), I'm OK with it. But please log an issue. We can't ship with conflicting types across our assemblies. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK spoke to Senja, we should revert that :)


In reply to: 226113726 [](ancestors = 226113726)

/// <summary>
/// Extensions for normalizer operations.
/// </summary>
public static class NormalizerCatalog
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the use cases for NormalizerCatalog? I don't see any usages of it in this PR.

I see it is a static class, while all the other "catalog" classes are instance members of MLContext. Should it be designed the same as say TransformsCatalog?

Also, I don't see any tests using this class. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I get it - this provides extension methods to the TransformsCatalog. How about we name these "Extension" only classes differently then? It is confusing in my opinion that NormalizerCatalog and TransformsCatalog are named similarly, but are designed differently.


In reply to: 226110284 [](ancestors = 226110284)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a set of extensions to TransformsCatalog. I guess I will need to rename it


In reply to: 226110284 [](ancestors = 226110284)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add tests :D


In reply to: 226110673 [](ancestors = 226110673,226110284)

@@ -1320,6 +1337,8 @@ public void Save(ModelSaveContext ctx)

public IDataView Read(IMultiStreamSource source) => new BoundLoader(this, source);

public IDataView Read(string path) => Read(new MultiFileSource(path));
Copy link
Member

@eerhardt eerhardt Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ #Closed

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0 Zruty0 merged commit 9157cea into dotnet:master Oct 18, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants