Add examples for clustering #222

Ivanidzo4ka · 2018-05-23T23:30:05Z

Address #205

asthana86 · 2018-05-24T13:57:57Z

This is great!, can we also add this as an E2E sample in dotnet/machinelearning/samples with a readme.md similar to the ones we are adding for regression, binary and multi-class classification!

Ivanidzo4ka · 2018-05-24T17:36:28Z

I don't see "dotnet/machinelearning/samples" repo. Can you provide link to it?

In reply to: 391724603 [](ancestors = 391724603)

zeahmed · 2018-05-24T17:50:40Z

test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs

+
+            var pipeline = new LearningPipeline();
+            pipeline.Add(new TextLoader(dataPath).CreateFrom<NewsData>(useHeader: false));
+            pipeline.Add(new CategoricalOneHotVectorizer("Label"));


I think "Label" is not used in clustering unless model is being evaluated against true labels. Why CategoricalOneHotVectorizer is being applied on "Label"? #Resolved

zeahmed · 2018-05-24T17:58:43Z

test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs

+            string dataPath = GetDataPath(@"external/20newsgroups.txt");
+
+            var pipeline = new LearningPipeline();
+            pipeline.Add(new TextLoader(dataPath).CreateFrom<NewsData>(useHeader: false));


Make sure to set allowQuotedStrings and supportSparse properly. The dataset that I have is NOT quoted and is not in sparse format. By default, these two are turned on in TextLoader. #Resolved

Data set I have is actually have quotes inside mail content. but's it's definitely not sparse

In reply to: 190677490 [](ancestors = 190677490)

zeahmed · 2018-05-24T19:02:40Z

test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs

+            pipeline.Add(CollectionDataSource.Create(data));
+            pipeline.Add(new KMeansPlusPlusClusterer() { K = k });
+            var model = pipeline.Train<ClusteringData, ClusteringPrediction>();
+            //validate no initial centers of clusters belong to same cluster.


Just a minor comment.
//validate no initial centers of clusters belong to same cluster.
These don't seem to be initial center as these are not set as initial cluster centers to KMean trainer. That is what initial center means in KMean or other clustering algorithms.

Rather, these are just data points curated in a way that these appear to be cluster centers initially. #Pending

is current phrasing better?

In reply to: 190697440 [](ancestors = 190697440)

TomFinley

Thanks @Ivanidzo4ka -- might want to change title of #205 since it is I think wrong (that is, the API was there, it just wasn't clear how to use it, which you have I hope now addressed.)

TomFinley · 2018-05-24T17:01:35Z

test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs

+            [Column(ordinal: "0")]
+            public string Id;
+
+            [Column(ordinal: "1", name: "Label")]


Should these be using DefaultColumnNames?

TomFinley · 2018-05-24T20:58:18Z

test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs

+
+            pipeline.Add(new KMeansPlusPlusClusterer() { K = 20 });
+            var model = pipeline.Train<NewsData, ClusteringPrediction>();
+            var gunResult = model.Predict(new NewsData() { Subject = "Let's disscuss gun control", Content = @"The United States has 88.8 guns per 100 people, or about 270,000,000 guns, which is the highest total and per capita number in the world. 22% of Americans own one or more guns (35% of men and 12% of women). America's pervasive gun culture stems in part from its colonial history, revolutionary roots, frontier expansion, and the Second Amendment, which states: ""A well regulated militia,


Oh good I'm glad we didn't decide to write anything controversial here. 😄

justinormont · 2018-05-24T22:53:22Z

@asthana86

This is great!, can we also add this as an E2E sample in dotnet/machinelearning/samples with a readme.md similar to the ones we are adding for regression, binary and multi-class classification!

@zyw400 may have some samples for clustering which we can move to the repo.

* example * add Clusters tests * cleanup * address comments * bring clustering reference back * rephrasing

Ivan Matantsev added 3 commits May 22, 2018 15:37

example

209f2dd

add Clusters tests

62e623e

cleanup

eab3481

Ivanidzo4ka requested review from GalOshri and codemzs May 24, 2018 00:36

zeahmed reviewed May 24, 2018

View reviewed changes

Ivan Matantsev added 3 commits May 24, 2018 11:07

merge master

c7b5597

address comments

a513b45

bring clustering reference back

330a584

zeahmed reviewed May 24, 2018

View reviewed changes

zeahmed approved these changes May 24, 2018

View reviewed changes

rephrasing

e013630

TomFinley approved these changes May 24, 2018

View reviewed changes

justinormont merged commit b1bbceb into dotnet:master May 24, 2018

TomFinley mentioned this pull request Jun 5, 2018

Add release notes for ML.NET 0.2 #301

Merged

eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018

Add examples for clustering (dotnet#222)

eabb6b0

* example * add Clusters tests * cleanup * address comments * bring clustering reference back * rephrasing

ghost locked as resolved and limited conversation to collaborators Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add examples for clustering #222

Add examples for clustering #222

Ivanidzo4ka commented May 23, 2018

asthana86 commented May 24, 2018

Ivanidzo4ka commented May 24, 2018

zeahmed May 24, 2018 •

edited by Ivanidzo4ka

Loading

zeahmed May 24, 2018 •

edited by Ivanidzo4ka

Loading

Ivanidzo4ka May 24, 2018

zeahmed May 24, 2018 •

edited by Ivanidzo4ka

Loading

Ivanidzo4ka May 24, 2018

TomFinley left a comment

TomFinley May 24, 2018

TomFinley May 24, 2018

justinormont commented May 24, 2018

Add examples for clustering #222

Add examples for clustering #222

Conversation

Ivanidzo4ka commented May 23, 2018

asthana86 commented May 24, 2018

Ivanidzo4ka commented May 24, 2018

zeahmed May 24, 2018 • edited by Ivanidzo4ka Loading

Choose a reason for hiding this comment

zeahmed May 24, 2018 • edited by Ivanidzo4ka Loading

Choose a reason for hiding this comment

Ivanidzo4ka May 24, 2018

Choose a reason for hiding this comment

zeahmed May 24, 2018 • edited by Ivanidzo4ka Loading

Choose a reason for hiding this comment

Ivanidzo4ka May 24, 2018

Choose a reason for hiding this comment

TomFinley left a comment

Choose a reason for hiding this comment

TomFinley May 24, 2018

Choose a reason for hiding this comment

TomFinley May 24, 2018

Choose a reason for hiding this comment

justinormont commented May 24, 2018

zeahmed May 24, 2018 •

edited by Ivanidzo4ka

Loading

zeahmed May 24, 2018 •

edited by Ivanidzo4ka

Loading

zeahmed May 24, 2018 •

edited by Ivanidzo4ka

Loading