Consider making bhmm core compatible with sklearn #42

franknoe · 2015-05-24T07:33:19Z

Following #40:

"could optimally be made compatible with scikit-learn, which seems to be the modern way new machine learning codes are written."

It seems all we need to do to make the HMM estimators compatible with sklearn is to implement the BaseEstimator interface:
http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator
Perhaps we could also implement the ClassifierMixin behavior (score) in order to do cross-validation.
Also I guess it would make sense to have the fit(X,..) function, but I'm not sure where in sklearn that function is actually defined, i.e. is there any base class defining it, or is it just a duck-typing convention?

This is both very easy to do, but I don't see why we would need a dependency on sklearn for that. I would like to avoid dependencies on heavy packages unless we don't use some of their functionalities (currently we do use the sklearn's Gaussian Mixture Model, but I guess this is a temporary solution).

@kylebeauchamp is dependency on sklearn necessary to have an effective implementation of the BaseEstimator class (i.e. is issubclass somewhere checked explicitly in sklearn algorithms), or is duck-typing sufficient?

franknoe · 2015-05-24T10:09:33Z

Small addition: Even if explicit subclassing is necessary in order to use our estimators with sklearn algorithms, it would be simple to do that in another package that uses sklearn. Something like (just a sketch, I haven't tried that code):

from bhmm import MLHMM
from sklearn.base import BaseEstimator

class myMLHMM(MLHMM, BaseEstimator):
    pass

hmm = myMLHMM()
# do some fancy sklearn stuff with hmm object here

That way we could avoid explicit dependency on sklearn in bhmm and still provide all functionalities.

marscher · 2015-07-06T14:26:28Z

The current release (0.3.0) defines scikit-learn as a dependency (both in setup.py and conda recipe), however it is never being used in the code. Is this intended?

franknoe · 2015-07-06T18:43:42Z

I think we use sklearn's Gaussian Mixture Model estimator for the
initialization of Gaussian HMMs. This was meant to be a temporary
solution. I think we can in principle make bhmm completely
sklearn-compatible without any dependency to sklearn, like I have
started to do it for pyEMMA.

Am 06/07/15 um 16:26 schrieb Martin K. Scherer:

The current release (0.3.0) defines scikit-learn as a dependency (both
in setup.py and conda recipe), however it is never being used in the
code. Is this intended?

—
Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).

Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe · 2015-07-06T21:27:55Z

Addition: sklearn is used in l. 37-39 of this file:
https://github.com/bhmm/bhmm/blob/master/bhmm/init/gaussian.py
I realize that we would inherit the sklearn dependency in pyemma if pyemma depends on bhmm. Perhaps we could just make a copy of the sklearn gmm estimation for now which is easy to isolate? sklearn is open bsd, so there should be no problem with that.

marscher · 2015-07-06T22:53:29Z

Sounds reasonable, if it is easy to extract.

franknoe · 2015-07-06T22:57:29Z

OK, please have a look

Am 07/07/15 um 00:53 schrieb Martin K. Scherer:

Sounds reasonable, if it is easy to extract.

—
Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).

Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher · 2015-07-06T23:10:04Z

It also uses KMeans for initializing the means during EM. So we would need to extract this stuff too. The downside of extracting it is also we do not easily receive updates/bugfixes for that code. For those two reason I would dis-advise extracting.

franknoe · 2015-07-06T23:21:00Z

I'll have a look. We certainly don't need k-means initialization at the
moment for our bhmm purposes. I'd be happy to use dependencies to
package that we take significant advantage of, but this is not the case
at the moment.

I definitely want to avoid unnecessary dependencies for pyemma. There's
no headache for the anaconda install but with pip there are tons of
possible problems and every dependency adds to them.

Am 07/07/15 um 01:10 schrieb Martin K. Scherer:

It also uses KMeans for initializing the means during EM. So we would
need to extract this stuff too.
The downside of extracting it is also we do not easily receive
updates/bugfixes for that code. For those two reason I would dis-advise
extracting.

—
Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).

Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe · 2015-07-06T23:32:40Z

We can instead include Moritz' code. It's faster anyway.
In principle I think the initialization step is sufficient.

Am 07/07/15 um 01:10 schrieb Martin K. Scherer:

It also uses KMeans for initializing the means during EM. So we would
need to extract this stuff too.
The downside of extracting it is also we do not easily receive
updates/bugfixes for that code. For those two reason I would dis-advise
extracting.

—
Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).

Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe mentioned this issue May 24, 2015

Refactor BHMM core into separate repo from spectroscopic analysis using BHMMs? #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider making bhmm core compatible with sklearn #42

Consider making bhmm core compatible with sklearn #42

franknoe commented May 24, 2015

franknoe commented May 24, 2015

marscher commented Jul 6, 2015

franknoe commented Jul 6, 2015

franknoe commented Jul 6, 2015

marscher commented Jul 6, 2015 via email

franknoe commented Jul 6, 2015

marscher commented Jul 6, 2015 via email

franknoe commented Jul 6, 2015

franknoe commented Jul 6, 2015

Consider making bhmm core compatible with sklearn #42

Consider making bhmm core compatible with sklearn #42

Comments

franknoe commented May 24, 2015

franknoe commented May 24, 2015

marscher commented Jul 6, 2015

franknoe commented Jul 6, 2015

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented Jul 6, 2015

marscher commented Jul 6, 2015 via email

franknoe commented Jul 6, 2015

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented Jul 6, 2015 via email

franknoe commented Jul 6, 2015

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented Jul 6, 2015

Mail: Arnimallee 6, 14195 Berlin, Germany