Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify API for advanced settings (RandomizedPcaTrainer) #2390

Merged
merged 11 commits into from
Feb 13, 2019

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Feb 3, 2019

Towards #1798 .

This PR addresses the following algos

  • RandomizedPcaTrainer

The following changes have been made:

  • Make constructors internal .
  • Rename Arguments to Options
  • Rename Options objects as options (instead of args or advancedSettings used so far)

@abgoswam abgoswam requested review from sfilipi and artidoro February 3, 2019 17:07
@codecov
Copy link

codecov bot commented Feb 3, 2019

Codecov Report

Merging #2390 into master will increase coverage by 0.01%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master    #2390      +/-   ##
==========================================
+ Coverage   71.26%   71.27%   +0.01%     
==========================================
  Files         797      799       +2     
  Lines      141292   141377      +85     
  Branches    16118    16118              
==========================================
+ Hits       100692   100768      +76     
- Misses      36138    36147       +9     
  Partials     4462     4462
Flag Coverage Δ
#Debug 71.27% <93.75%> (+0.01%) ⬆️
#production 67.6% <91.42%> (+0.01%) ⬆️
#test 85.36% <100%> (ø) ⬆️

@@ -103,23 +103,23 @@ public class Arguments : UnsupervisedLearnerInputBaseWithWeight

}

internal RandomizedPcaTrainer(IHostEnvironment env, Arguments args)
:this(env, args, args.FeatureColumn, args.WeightColumn)
internal RandomizedPcaTrainer(IHostEnvironment env, Options options)
Copy link
Contributor

@artidoro artidoro Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomizedPcaTrainer [](start = 17, length = 20)

It's strange... I noticed that renaming Arguments to Options did not modify anything in the mlContext catalog. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked it up, and I don't think there is an entry for this trainer in mlContext. Can you add it?


In reply to: 253319239 [](ancestors = 253319239)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. i noticed couple more components which do not have mlcontext extension.

will add


In reply to: 253319255 [](ancestors = 253319255,253319239)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added mlcontext extension for this. Also added a test for it that exercises the Fit() and Transform() APIs.

Evaluate() API currently missing from Anomaly Detection. i will create a separate issue for that.


In reply to: 253584603 [](ancestors = 253584603,253319255,253319239)

@abgoswam abgoswam requested a review from rogancarr February 4, 2019 18:21
@@ -49,7 +49,7 @@ public sealed class RandomizedPcaTrainer : TrainerEstimatorBase<AnomalyPredictio
internal const string Summary = "This algorithm trains an approximate PCA using Randomized SVD algorithm. "
+ "This PCA can be made into Kernel PCA by using Random Fourier Features transform.";

public class Arguments : UnsupervisedLearnerInputBaseWithWeight
public class Options : UnsupervisedLearnerInputBaseWithWeight
Copy link
Member

@sfilipi sfilipi Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options [](start = 21, length = 7)

xml docs are coming later? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap.


In reply to: 254353429 [](ancestors = 254353429)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -91,7 +91,7 @@ public class Arguments : UnsupervisedLearnerInputBaseWithWeight
/// <param name="oversampling">Oversampling parameter for randomized PCA training.</param>
/// <param name="center">If enabled, data is centered to be zero mean.</param>
/// <param name="seed">The seed for random number generation.</param>
public RandomizedPcaTrainer(IHostEnvironment env,
internal RandomizedPcaTrainer(IHostEnvironment env,
Copy link
Contributor

@rogancarr rogancarr Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just make the class Internal/BestFriend and keep this public? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really.

we would want to expose these through mlcontext. not via constructors


In reply to: 254391018 [](ancestors = 254391018)

Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. I hadn't seen the pattern for trainable transforms where the class is public and methods are internal. #Resolved

Copy link
Member Author

@abgoswam abgoswam Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe in ML.NET terms, this is "trainer estimator" (for anomaly detection tasks)

most other "trainer estimator"s follow the same pattern e.g. KMeansPlusPlusTrainer


In reply to: 255587702 [](ancestors = 255587702)

@@ -347,14 +347,14 @@ protected override AnomalyPredictionTransformer<PcaModelParameters> MakeTransfor
Desc = "Train an PCA Anomaly model.",
UserName = UserNameValue,
ShortName = ShortName)]
internal static CommonOutputs.AnomalyDetectionOutput TrainPcaAnomaly(IHostEnvironment env, Arguments input)
internal static CommonOutputs.AnomalyDetectionOutput TrainPcaAnomaly(IHostEnvironment env, Options input)
Copy link
Contributor

@rogancarr rogancarr Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above; these can be kept public and the whole class can be made internal/BestFriend. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Options class above should be public. Hence we cannot make entire class internal.


In reply to: 254391534 [](ancestors = 254391534)

var trainData = reader.Read(GetDataPath(TestDatasets.mnistOneClass.trainFilename));
var testData = reader.Read(GetDataPath(TestDatasets.mnistOneClass.testFilename));

var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(featureColumn);
Copy link
Member Author

@abgoswam abgoswam Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnomalyDetection [](start = 37, length = 16)

Anomaly detection currently does not support evaluation metrics

Issue #2471 #Resolved

/// <param name="score">The name of the score column in <paramref name="data"/>.</param>
/// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param>
/// <returns>The evaluation results for these outputs.</returns>
public AnomalyDetectionMetrics Evaluate(IDataView data, string label, string score, string predictedLabel)
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults for label, score, and predictedLabel? #Resolved

{
public sealed class AnomalyDetectionMetrics
{
public double Auc { get; }
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summaries, Remarks, and links to relevant documentation. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added basic summaries for now.

wanted to also add the remarks from TLC website., but the explanations there were not clear esp. for the detection rate metrics.


In reply to: 255583277 [](ancestors = 255583277)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these summaries, check in with @shmoradims ; he's building a set of generic docs for things like AUC, F1, RMSE, etc.


In reply to: 255703503 [](ancestors = 255703503,255583277)

@@ -35,5 +36,25 @@ public static class PcaCatalog
/// <param name="columns">Input columns to apply PrincipalComponentAnalysis on.</param>
public static PrincipalComponentAnalysisEstimator ProjectToPrincipalComponents(this TransformsCatalog.ProjectionTransforms catalog, params PrincipalComponentAnalysisEstimator.ColumnInfo[] columns)
=> new PrincipalComponentAnalysisEstimator(CatalogUtils.GetEnvironment(catalog), columns);

public static RandomizedPcaTrainer RandomizedPca(this AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog,
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs xml docs with remarks and links to a sample. Here or add to #1209 . #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added xml docs.

sample adding can be part of overall documentation effort #1209


In reply to: 255585449 [](ancestors = 255585449)

@@ -91,7 +91,7 @@ public class Arguments : UnsupervisedLearnerInputBaseWithWeight
/// <param name="oversampling">Oversampling parameter for randomized PCA training.</param>
/// <param name="center">If enabled, data is centered to be zero mean.</param>
/// <param name="seed">The seed for random number generation.</param>
public RandomizedPcaTrainer(IHostEnvironment env,
internal RandomizedPcaTrainer(IHostEnvironment env,
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. I hadn't seen the pattern for trainable transforms where the class is public and methods are internal. #Resolved

// Evaluate
var metrics = ML.AnomalyDetection.Evaluate(transformedData);

Assert.Equal(0.99, metrics.Auc, 2);
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that these numbers are correct? #Resolved

Copy link
Member Author

@abgoswam abgoswam Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried out the same dataset in TLC with the same trainer, the numbers are close. Not exact hough

in general, in this PR i am only exposing the trainer / evaluators as they exist currently in the codebase. the PR does not have any algorithmic changes or changes in evaluation metrics themselves.


In reply to: 255589211 [](ancestors = 255589211)

Copy link
Contributor

@rogancarr rogancarr Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the big question is, what do we want to test here?

  • Do we want a baseline test to make sure future versions don't change the numerics? (e.g. AUC is always 0.99 with 2 decimals of precision?
  • Do we want a functionality test, like we have in Functional.Tests? (e.g. check that 0 <= AUC <= 1)
  • Do we want correctness tests? (e.g. We know what the answer should be and we want to make sure this matches it.)

If we do want a baseline test, can we mark it as such, and check to further decimal places?

As an aside, are there correctness tests on these metrics that we can migrate from the internal repo? If so, can you file it as an issue to be done later?)


In reply to: 255680814 [](ancestors = 255680814,255589211)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

as per the classification above, these seem like "baseline" tests, I have increased the precision to 5 places of decimal.

as for test migration from internal repo, it seems we ported over the PcaAnomalyTest already . that should suffice for correctness.


In reply to: 255796608 [](ancestors = 255796608,255680814,255589211)

/// <param name="label">The name of the label column in <paramref name="data"/>.</param>
/// <param name="score">The name of the score column in <paramref name="data"/>.</param>
/// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param>
/// <returns>The evaluation results for these calibrated outputs.</returns>
Copy link
Contributor

@artidoro artidoro Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calibrated [](start = 54, length = 10)

What is calibrated here? Could you explain in a few more words? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was a copy-paste typo. fixed...


In reply to: 255717912 [](ancestors = 255717912)

/// </summary>
public double DrAtK { get; }
/// <summary>
/// Detection rate at fraction p false positives.
Copy link
Contributor

@artidoro artidoro Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p [](start = 39, length = 1)

What is p? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me the Detection rate metrics look suspicious. perhaps we should not expose these for V1.0 . only expose AUC


In reply to: 255719011 [](ancestors = 255719011)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed that detectionrate@p metric.


In reply to: 255725823 [](ancestors = 255725823,255719011)

@rogancarr
Copy link
Contributor

Just one last question about tests!

var metrics = ML.AnomalyDetection.Evaluate(transformedData, k: 10);

Assert.Equal(0.98558, metrics.Auc, 5);
Assert.Equal(0.90, metrics.DrAtK, 2);
Copy link
Contributor

@rogancarr rogancarr Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.Equal(0.90, metrics.DrAtK, 2); [](start = 11, length = 38)

This one @ 5 places too, please :) #Resolved

/// RandomizedPcaTrainer test
/// </summary>
[Fact]
public void RandomizedPcaTrainer()
Copy link
Contributor

@rogancarr rogancarr Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomizedPcaTrainer [](start = 20, length = 20)

RandomizedPcaTrainerBaselineTest #Resolved

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments.

:shipit:

@abgoswam abgoswam merged commit fd30559 into dotnet:master Feb 13, 2019
@abgoswam
Copy link
Member Author

Thanks for the reviews!

@abgoswam abgoswam deleted the abgoswam/action_arguments_pca branch February 20, 2019 16:58
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing support for Anomaly Detection metrics.
4 participants