Skip to content
This repository has been archived by the owner on Apr 19, 2019. It is now read-only.

Proposal: interface for JuliaStats packages (Statistics and Machine Learning)

Gustavo Lacerda edited this page Jul 16, 2014 · 14 revisions

StatisticalModel is at the top of the type hierarchy.

There is one universal method:

  • Every statistical model can be fit to data. This takes data as input: fit(obj::StatisticalModel, data...). fit should be able to handle data in the form of DataFrame or matrix.

Now, we should make a distinction between supervised and unsupervised learning.

Supervised learning

  • Every supervised model can make "predictions" using predict(FittedModel, newX::SamplePoint). But we should be careful here, since predictions may not live in the sample space, e.g. in the case of logistic regression, predictions live on the interval [0,1], but the sample space is the binary set {0,1}.

Unsupervised learning

Unsupervised learning comprises methods for clustering and transformation of the data, often with dimension reduction.

  • compress(FittedModel, X::SamplePoint) for unsupervised models is analogous to predict for supervised models. It returns the "compressed" value of the observation: in the case of PCA with a given k, it projects to the subspace spanned by the top k eigenvectors; in the case of clustering, it projects each observation to its cluster center. Both the input and the output live in the sample space.

  • transform (can we find a less general name?) loadings. The input lives in the sample space, but the output lives in the transformed (typically lower-dimensional) space.

Discussions

https://github.com/JuliaStats/Roadmap.jl/issues/4

Clone this wiki locally