-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples for clustering #222
Add examples for clustering #222
Conversation
This is great!, can we also add this as an E2E sample in dotnet/machinelearning/samples with a readme.md similar to the ones we are adding for regression, binary and multi-class classification! |
I don't see "dotnet/machinelearning/samples" repo. Can you provide link to it? In reply to: 391724603 [](ancestors = 391724603) |
|
||
var pipeline = new LearningPipeline(); | ||
pipeline.Add(new TextLoader(dataPath).CreateFrom<NewsData>(useHeader: false)); | ||
pipeline.Add(new CategoricalOneHotVectorizer("Label")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Label" is not used in clustering unless model is being evaluated against true labels. Why CategoricalOneHotVectorizer is being applied on "Label"? #Resolved
string dataPath = GetDataPath(@"external/20newsgroups.txt"); | ||
|
||
var pipeline = new LearningPipeline(); | ||
pipeline.Add(new TextLoader(dataPath).CreateFrom<NewsData>(useHeader: false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to set allowQuotedStrings and supportSparse properly. The dataset that I have is NOT quoted and is not in sparse format. By default, these two are turned on in TextLoader. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data set I have is actually have quotes inside mail content. but's it's definitely not sparse
In reply to: 190677490 [](ancestors = 190677490)
pipeline.Add(CollectionDataSource.Create(data)); | ||
pipeline.Add(new KMeansPlusPlusClusterer() { K = k }); | ||
var model = pipeline.Train<ClusteringData, ClusteringPrediction>(); | ||
//validate no initial centers of clusters belong to same cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment.
//validate no initial centers of clusters belong to same cluster.
These don't seem to be initial center
as these are not set as initial cluster centers to KMean trainer. That is what initial center means in KMean or other clustering algorithms.
Rather, these are just data points curated in a way that these appear to be cluster centers initially. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Ivanidzo4ka -- might want to change title of #205 since it is I think wrong (that is, the API was there, it just wasn't clear how to use it, which you have I hope now addressed.)
[Column(ordinal: "0")] | ||
public string Id; | ||
|
||
[Column(ordinal: "1", name: "Label")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be using DefaultColumnNames
?
|
||
pipeline.Add(new KMeansPlusPlusClusterer() { K = 20 }); | ||
var model = pipeline.Train<NewsData, ClusteringPrediction>(); | ||
var gunResult = model.Predict(new NewsData() { Subject = "Let's disscuss gun control", Content = @"The United States has 88.8 guns per 100 people, or about 270,000,000 guns, which is the highest total and per capita number in the world. 22% of Americans own one or more guns (35% of men and 12% of women). America's pervasive gun culture stems in part from its colonial history, revolutionary roots, frontier expansion, and the Second Amendment, which states: ""A well regulated militia, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good I'm glad we didn't decide to write anything controversial here. 😄
@zyw400 may have some samples for clustering which we can move to the repo. |
* example * add Clusters tests * cleanup * address comments * bring clustering reference back * rephrasing
Address #205