Modify API for advanced settings (RandomizedPcaTrainer) #2390

abgoswam · 2019-02-03T17:06:36Z

Towards #1798 .

adds MLContext extension for Anomaly detection Issue Add AnomalyDetectionContext to the TrainContext #1369
fixes Missing support for Anomaly Detection metrics. #2471

This PR addresses the following algos

RandomizedPcaTrainer

The following changes have been made:

Make constructors internal .
Rename Arguments to Options
Rename Options objects as options (instead of args or advancedSettings used so far)

codecov · 2019-02-03T18:21:01Z

Codecov Report

Merging #2390 into master will increase coverage by 0.01%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master    #2390      +/-   ##
==========================================
+ Coverage   71.26%   71.27%   +0.01%     
==========================================
  Files         797      799       +2     
  Lines      141292   141377      +85     
  Branches    16118    16118              
==========================================
+ Hits       100692   100768      +76     
- Misses      36138    36147       +9     
  Partials     4462     4462

Flag	Coverage Δ
#Debug	`71.27% <93.75%> (+0.01%)`	⬆️
#production	`67.6% <91.42%> (+0.01%)`	⬆️
#test	`85.36% <100%> (ø)`	⬆️

artidoro · 2019-02-03T21:00:56Z

src/Microsoft.ML.PCA/PcaTrainer.cs

@@ -103,23 +103,23 @@ public class Arguments : UnsupervisedLearnerInputBaseWithWeight

        }

-        internal RandomizedPcaTrainer(IHostEnvironment env, Arguments args)
-            :this(env, args, args.FeatureColumn, args.WeightColumn)
+        internal RandomizedPcaTrainer(IHostEnvironment env, Options options)


RandomizedPcaTrainer [](start = 17, length = 20)

It's strange... I noticed that renaming Arguments to Options did not modify anything in the mlContext catalog. #Resolved

I looked it up, and I don't think there is an entry for this trainer in mlContext. Can you add it?

In reply to: 253319239 [](ancestors = 253319239)

yeah. i noticed couple more components which do not have mlcontext extension.

will add

In reply to: 253319255 [](ancestors = 253319255,253319239)

i added mlcontext extension for this. Also added a test for it that exercises the Fit() and Transform() APIs.

Evaluate() API currently missing from Anomaly Detection. i will create a separate issue for that.

In reply to: 253584603 [](ancestors = 253584603,253319255,253319239)

sfilipi · 2019-02-06T16:44:55Z

src/Microsoft.ML.PCA/PcaTrainer.cs

@@ -49,7 +49,7 @@ public sealed class RandomizedPcaTrainer : TrainerEstimatorBase<AnomalyPredictio
        internal const string Summary = "This algorithm trains an approximate PCA using Randomized SVD algorithm. "
            + "This PCA can be made into Kernel PCA by using Random Fourier Features transform.";

-        public class Arguments : UnsupervisedLearnerInputBaseWithWeight
+        public class Options : UnsupervisedLearnerInputBaseWithWeight


Options [](start = 21, length = 7)

xml docs are coming later? #Pending

yeap.

In reply to: 254353429 [](ancestors = 254353429)

sfilipi

rogancarr · 2019-02-06T18:21:22Z

src/Microsoft.ML.PCA/PcaTrainer.cs

@@ -91,7 +91,7 @@ public class Arguments : UnsupervisedLearnerInputBaseWithWeight
        /// <param name="oversampling">Oversampling parameter for randomized PCA training.</param>
        /// <param name="center">If enabled, data is centered to be zero mean.</param>
        /// <param name="seed">The seed for random number generation.</param>
-        public RandomizedPcaTrainer(IHostEnvironment env,
+        internal RandomizedPcaTrainer(IHostEnvironment env,


Shouldn't we just make the class Internal/BestFriend and keep this public? #Resolved

not really.

we would want to expose these through mlcontext. not via constructors

In reply to: 254391018 [](ancestors = 254391018)

Resolved. I hadn't seen the pattern for trainable transforms where the class is public and methods are internal. #Resolved

i believe in ML.NET terms, this is "trainer estimator" (for anomaly detection tasks)

most other "trainer estimator"s follow the same pattern e.g. KMeansPlusPlusTrainer

In reply to: 255587702 [](ancestors = 255587702)

rogancarr · 2019-02-06T18:22:51Z

src/Microsoft.ML.PCA/PcaTrainer.cs

@@ -347,14 +347,14 @@ protected override AnomalyPredictionTransformer<PcaModelParameters> MakeTransfor
            Desc = "Train an PCA Anomaly model.",
            UserName = UserNameValue,
            ShortName = ShortName)]
-        internal static CommonOutputs.AnomalyDetectionOutput TrainPcaAnomaly(IHostEnvironment env, Arguments input)
+        internal static CommonOutputs.AnomalyDetectionOutput TrainPcaAnomaly(IHostEnvironment env, Options input)


Same as above; these can be kept public and the whole class can be made internal/BestFriend. #Resolved

The Options class above should be public. Hence we cannot make entire class internal.

In reply to: 254391534 [](ancestors = 254391534)

abgoswam · 2019-02-07T23:43:40Z

test/Microsoft.ML.Tests/AnomalyDetectionTests.cs

+            var trainData = reader.Read(GetDataPath(TestDatasets.mnistOneClass.trainFilename));
+            var testData = reader.Read(GetDataPath(TestDatasets.mnistOneClass.testFilename));
+
+            var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(featureColumn);


AnomalyDetection [](start = 37, length = 16)

Anomaly detection currently does not support evaluation metrics

Issue #2471 #Resolved

…seeds?

rogancarr · 2019-02-11T16:23:37Z

src/Microsoft.ML.Data/Evaluators/AnomalyDetectionEvaluator.cs

+        /// <param name="score">The name of the score column in <paramref name="data"/>.</param>
+        /// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param>
+        /// <returns>The evaluation results for these outputs.</returns>
+        public AnomalyDetectionMetrics Evaluate(IDataView data, string label, string score, string predictedLabel)


Defaults for label, score, and predictedLabel? #Resolved

rogancarr · 2019-02-11T16:24:57Z

src/Microsoft.ML.Data/Evaluators/Metrics/AnomalyDetectionMetrics.cs

+{
+    public sealed class AnomalyDetectionMetrics
+    {
+        public double Auc { get; }


Summaries, Remarks, and links to relevant documentation. #Resolved

added basic summaries for now.

wanted to also add the remarks from TLC website., but the explanations there were not clear esp. for the detection rate metrics.

In reply to: 255583277 [](ancestors = 255583277)

For these summaries, check in with @shmoradims ; he's building a set of generic docs for things like AUC, F1, RMSE, etc.

In reply to: 255703503 [](ancestors = 255703503,255583277)

rogancarr · 2019-02-11T16:29:24Z

src/Microsoft.ML.PCA/PCACatalog.cs

@@ -35,5 +36,25 @@ public static class PcaCatalog
        /// <param name="columns">Input columns to apply PrincipalComponentAnalysis on.</param>
        public static PrincipalComponentAnalysisEstimator ProjectToPrincipalComponents(this TransformsCatalog.ProjectionTransforms catalog, params PrincipalComponentAnalysisEstimator.ColumnInfo[] columns)
            => new PrincipalComponentAnalysisEstimator(CatalogUtils.GetEnvironment(catalog), columns);
+
+        public static RandomizedPcaTrainer RandomizedPca(this AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog,


Needs xml docs with remarks and links to a sample. Here or add to #1209 . #Resolved

added xml docs.

sample adding can be part of overall documentation effort #1209

In reply to: 255585449 [](ancestors = 255585449)

rogancarr · 2019-02-11T16:34:00Z

src/Microsoft.ML.PCA/PcaTrainer.cs

@@ -91,7 +91,7 @@ public class Arguments : UnsupervisedLearnerInputBaseWithWeight
        /// <param name="oversampling">Oversampling parameter for randomized PCA training.</param>
        /// <param name="center">If enabled, data is centered to be zero mean.</param>
        /// <param name="seed">The seed for random number generation.</param>
-        public RandomizedPcaTrainer(IHostEnvironment env,
+        internal RandomizedPcaTrainer(IHostEnvironment env,


Resolved. I hadn't seen the pattern for trainable transforms where the class is public and methods are internal. #Resolved

rogancarr · 2019-02-11T16:37:21Z

test/Microsoft.ML.Tests/AnomalyDetectionTests.cs

+            // Evaluate
+            var metrics = ML.AnomalyDetection.Evaluate(transformedData);
+
+            Assert.Equal(0.99, metrics.Auc, 2);


How do we know that these numbers are correct? #Resolved

i tried out the same dataset in TLC with the same trainer, the numbers are close. Not exact hough

in general, in this PR i am only exposing the trainer / evaluators as they exist currently in the codebase. the PR does not have any algorithmic changes or changes in evaluation metrics themselves.

In reply to: 255589211 [](ancestors = 255589211)

I guess the big question is, what do we want to test here?

Do we want a baseline test to make sure future versions don't change the numerics? (e.g. AUC is always 0.99 with 2 decimals of precision?

Do we want a functionality test, like we have in Functional.Tests? (e.g. check that 0 <= AUC <= 1)

Do we want correctness tests? (e.g. We know what the answer should be and we want to make sure this matches it.)

If we do want a baseline test, can we mark it as such, and check to further decimal places?

As an aside, are there correctness tests on these metrics that we can migrate from the internal repo? If so, can you file it as an issue to be done later?)

In reply to: 255680814 [](ancestors = 255680814,255589211)

good point.

as per the classification above, these seem like "baseline" tests, I have increased the precision to 5 places of decimal.

as for test migration from internal repo, it seems we ported over the PcaAnomalyTest already . that should suffice for correctness.

In reply to: 255796608 [](ancestors = 255796608,255680814,255589211)

artidoro · 2019-02-11T22:19:44Z

src/Microsoft.ML.Data/TrainCatalog.cs

+        /// <param name="label">The name of the label column in <paramref name="data"/>.</param>
+        /// <param name="score">The name of the score column in <paramref name="data"/>.</param>
+        /// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param>
+        /// <returns>The evaluation results for these calibrated outputs.</returns>


calibrated [](start = 54, length = 10)

What is calibrated here? Could you explain in a few more words? #Resolved

it was a copy-paste typo. fixed...

In reply to: 255717912 [](ancestors = 255717912)

artidoro · 2019-02-11T22:23:09Z

src/Microsoft.ML.Data/Evaluators/Metrics/AnomalyDetectionMetrics.cs

+        /// </summary>
+        public double DrAtK { get; }
+        /// <summary>
+        /// Detection rate at fraction p false positives.


p [](start = 39, length = 1)

What is p? #Resolved

to me the Detection rate metrics look suspicious. perhaps we should not expose these for V1.0 . only expose AUC

In reply to: 255719011 [](ancestors = 255719011)

removed that detectionrate@p metric.

In reply to: 255725823 [](ancestors = 255725823,255719011)

rogancarr · 2019-02-12T04:20:57Z

Just one last question about tests!

rogancarr · 2019-02-13T00:49:53Z

test/Microsoft.ML.Tests/AnomalyDetectionTests.cs

+            var metrics = ML.AnomalyDetection.Evaluate(transformedData, k: 10);
+
+            Assert.Equal(0.98558, metrics.Auc, 5);
+            Assert.Equal(0.90, metrics.DrAtK, 2);


Assert.Equal(0.90, metrics.DrAtK, 2); [](start = 11, length = 38)

This one @ 5 places too, please :) #Resolved

rogancarr · 2019-02-13T00:50:45Z

test/Microsoft.ML.Tests/AnomalyDetectionTests.cs

+        /// RandomizedPcaTrainer test 
+        /// </summary>
+        [Fact]
+        public void RandomizedPcaTrainer()


RandomizedPcaTrainer [](start = 20, length = 20)

RandomizedPcaTrainerBaselineTest #Resolved

rogancarr

Two small comments.

abgoswam · 2019-02-13T23:22:58Z

Thanks for the reviews!

RandomizedPcaTrainer constructor made internal

77746d6

abgoswam requested review from sfilipi and artidoro February 3, 2019 17:07

artidoro reviewed Feb 3, 2019

View reviewed changes

abgoswam requested a review from rogancarr February 4, 2019 18:21

sfilipi reviewed Feb 6, 2019

View reviewed changes

sfilipi approved these changes Feb 6, 2019

View reviewed changes

rogancarr reviewed Feb 6, 2019

View reviewed changes

abgoswam added 3 commits February 7, 2019 17:22

Merge branch 'master' into abgoswam/action_arguments_pca

03e98b8

MLCOntext for PCA

e5b1a74

update test example

8c47da1

abgoswam commented Feb 7, 2019

View reviewed changes

abgoswam added 3 commits February 8, 2019 00:05

Merge branch 'master' into abgoswam/action_arguments_pca

398475a

added evaluation metrics for anomaly detection

8adb8b1

make tests work. it seems adding a catalog to MLContext changes some …

62f5db8

…seeds?

rogancarr reviewed Feb 11, 2019

View reviewed changes

abgoswam added 2 commits February 11, 2019 18:44

also updating baseline file for Release builds

02b8651

review comments

b82c326

artidoro reviewed Feb 11, 2019

View reviewed changes

taking care of review comments

b1526d1

rogancarr reviewed Feb 13, 2019

View reviewed changes

rogancarr approved these changes Feb 13, 2019

View reviewed changes

abgoswam mentioned this pull request Feb 13, 2019

Registering subhosts when creating new catalog entries advances the pseudo random number generator #2523

Closed

update to latest master

d90ded5

abgoswam merged commit fd30559 into dotnet:master Feb 13, 2019

abgoswam deleted the abgoswam/action_arguments_pca branch February 20, 2019 16:58

rogancarr mentioned this pull request Feb 21, 2019

Add AnomalyDetectionContext to the TrainContext #1369

Closed

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Modify API for advanced settings (RandomizedPcaTrainer) #2390

Modify API for advanced settings (RandomizedPcaTrainer) #2390

Conversation

abgoswam commented Feb 3, 2019 • edited Loading

codecov bot commented Feb 3, 2019 • edited Loading

Codecov Report

artidoro Feb 3, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfilipi Feb 6, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfilipi left a comment

Choose a reason for hiding this comment

rogancarr Feb 6, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

abgoswam Feb 11, 2019 • edited Loading

Choose a reason for hiding this comment

rogancarr Feb 6, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Feb 7, 2019 • edited Loading

Choose a reason for hiding this comment

rogancarr Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

rogancarr Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

rogancarr Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

abgoswam Feb 11, 2019 • edited Loading

Choose a reason for hiding this comment

rogancarr Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artidoro Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artidoro Feb 11, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr commented Feb 12, 2019

rogancarr Feb 13, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

rogancarr Feb 13, 2019 • edited by abgoswam Loading

Choose a reason for hiding this comment

rogancarr left a comment • edited Loading

Choose a reason for hiding this comment

abgoswam commented Feb 13, 2019

abgoswam commented Feb 3, 2019 •

edited

Loading

codecov bot commented Feb 3, 2019 •

edited

Loading

artidoro Feb 3, 2019 •

edited by abgoswam

Loading

sfilipi Feb 6, 2019 •

edited by abgoswam

Loading

rogancarr Feb 6, 2019 •

edited by abgoswam

Loading

rogancarr Feb 11, 2019 •

edited by abgoswam

Loading

abgoswam Feb 11, 2019 •

edited

Loading

rogancarr Feb 6, 2019 •

edited by abgoswam

Loading

abgoswam Feb 7, 2019 •

edited

Loading

rogancarr Feb 11, 2019 •

edited by abgoswam

Loading

rogancarr Feb 11, 2019 •

edited by abgoswam

Loading

rogancarr Feb 11, 2019 •

edited by abgoswam

Loading

rogancarr Feb 11, 2019 •

edited by abgoswam

Loading

rogancarr Feb 11, 2019 •

edited by abgoswam

Loading

abgoswam Feb 11, 2019 •

edited

Loading

rogancarr Feb 12, 2019 •

edited

Loading

artidoro Feb 11, 2019 •

edited by abgoswam

Loading

artidoro Feb 11, 2019 •

edited by abgoswam

Loading

rogancarr Feb 13, 2019 •

edited by abgoswam

Loading

rogancarr Feb 13, 2019 •

edited by abgoswam

Loading

rogancarr left a comment •

edited

Loading