-
Notifications
You must be signed in to change notification settings - Fork 3
Proposal: interface for JuliaStats packages (Statistics and Machine Learning)
StatisticalModel
is at the top of the type hierarchy.
There is one universal method:
- Every statistical model can be fit to data. This takes data as input:
fit(obj::StatisticalModel, data...)
.fit
should be able to handle data in the form ofDataFrame
or matrix.
Now, we should make a distinction between supervised and unsupervised learning.
- Every supervised model can make "predictions" using
predict(FittedModel, newX::SamplePoint)
. But we should be careful here, since predictions may not live in the sample space, e.g. in the case of logistic regression, predictions live on the interval [0,1], but the sample space is the binary set {0,1}.
Unsupervised learning comprises methods for clustering and transformation of the data, often with dimension reduction.
-
compress(FittedModel, X::SamplePoint)
for unsupervised models is analogous topredict
for supervised models. It returns the "compressed" value of the observation: in the case of PCA with a given k, it projects to the subspace spanned by the top k eigenvectors; in the case of clustering, it projects each observation to its cluster center. Both the input and the output live in the sample space. -
transform
(can we find a less general name?) loadings. The input lives in the sample space, but the output lives in the transformed (typically lower-dimensional) space.