Skip to content
This repository has been archived by the owner on Apr 19, 2019. It is now read-only.

Proposal: interface for JuliaStats packages (Statistics and Machine Learning)

Gustavo Lacerda edited this page Jul 16, 2014 · 14 revisions

StatisticalModel is at the top of the type hierarchy.

There is one universal method:

  • Every statistical model can be fit to data. This takes data as input: fit(obj::StatisticalModel, data...). fit should be able to handle data in the form of DataFrame or matrix, and it should return another instance of StatisticalModel or perhaps FittedModel. Alternatively, one can call fit!, which would update the statistical model.

Now, we make a distinction between supervised and unsupervised learning.

Supervised learning

  • Every supervised model can make "predictions" using predict(FittedModel, newX::SamplePoint). But we should be careful here, since predictions may not live in the sample space, e.g. in the case of logistic regression, predictions live on the interval [0,1], but the sample space is the binary set {0,1}.

Unsupervised learning

Unsupervised learning comprises methods for clustering and transformation of the data, often with dimension reduction.

  • compress(FittedModel, X::SamplePoint) for unsupervised models is analogous to predict for supervised models. It returns the "compressed" value of the observation: in the case of PCA with a given k, it projects to the subspace spanned by the top k eigenvectors; in the case of clustering, it projects each observation to its cluster center. Both the input and the output live in the sample space.

  • transform (can we find a less general name?) loadings. The input lives in the sample space, but the output lives in the transformed (typically lower-dimensional) space.

Discussions

https://github.com/JuliaStats/Roadmap.jl/issues/4

Clone this wiki locally