Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making bhmm core compatible with sklearn #42

Open
franknoe opened this issue May 24, 2015 · 9 comments
Open

Consider making bhmm core compatible with sklearn #42

franknoe opened this issue May 24, 2015 · 9 comments

Comments

@franknoe
Copy link
Contributor

Following #40:

"could optimally be made compatible with scikit-learn, which seems to be the modern way new machine learning codes are written."

  • It seems all we need to do to make the HMM estimators compatible with sklearn is to implement the BaseEstimator interface:
    http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator
    Perhaps we could also implement the ClassifierMixin behavior (score) in order to do cross-validation.
  • Also I guess it would make sense to have the fit(X,..) function, but I'm not sure where in sklearn that function is actually defined, i.e. is there any base class defining it, or is it just a duck-typing convention?

This is both very easy to do, but I don't see why we would need a dependency on sklearn for that. I would like to avoid dependencies on heavy packages unless we don't use some of their functionalities (currently we do use the sklearn's Gaussian Mixture Model, but I guess this is a temporary solution).

@kylebeauchamp is dependency on sklearn necessary to have an effective implementation of the BaseEstimator class (i.e. is issubclass somewhere checked explicitly in sklearn algorithms), or is duck-typing sufficient?

@franknoe
Copy link
Contributor Author

Small addition: Even if explicit subclassing is necessary in order to use our estimators with sklearn algorithms, it would be simple to do that in another package that uses sklearn. Something like (just a sketch, I haven't tried that code):

from bhmm import MLHMM
from sklearn.base import BaseEstimator

class myMLHMM(MLHMM, BaseEstimator):
    pass

hmm = myMLHMM()
# do some fancy sklearn stuff with hmm object here

That way we could avoid explicit dependency on sklearn in bhmm and still provide all functionalities.

@marscher
Copy link
Member

marscher commented Jul 6, 2015

The current release (0.3.0) defines scikit-learn as a dependency (both in setup.py and conda recipe), however it is never being used in the code. Is this intended?

@franknoe
Copy link
Contributor Author

franknoe commented Jul 6, 2015

I think we use sklearn's Gaussian Mixture Model estimator for the
initialization of Gaussian HMMs. This was meant to be a temporary
solution. I think we can in principle make bhmm completely
sklearn-compatible without any dependency to sklearn, like I have
started to do it for pyEMMA.

Am 06/07/15 um 16:26 schrieb Martin K. Scherer:

The current release (0.3.0) defines scikit-learn as a dependency (both
in setup.py and conda recipe), however it is never being used in the
code. Is this intended?


Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@franknoe
Copy link
Contributor Author

franknoe commented Jul 6, 2015

Addition: sklearn is used in l. 37-39 of this file:
https://github.com/bhmm/bhmm/blob/master/bhmm/init/gaussian.py
I realize that we would inherit the sklearn dependency in pyemma if pyemma depends on bhmm. Perhaps we could just make a copy of the sklearn gmm estimation for now which is easy to isolate? sklearn is open bsd, so there should be no problem with that.

@marscher
Copy link
Member

marscher commented Jul 6, 2015 via email

@franknoe
Copy link
Contributor Author

franknoe commented Jul 6, 2015

OK, please have a look

Am 07/07/15 um 00:53 schrieb Martin K. Scherer:

Sounds reasonable, if it is easy to extract.


Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@marscher
Copy link
Member

marscher commented Jul 6, 2015 via email

@franknoe
Copy link
Contributor Author

franknoe commented Jul 6, 2015

I'll have a look. We certainly don't need k-means initialization at the
moment for our bhmm purposes. I'd be happy to use dependencies to
package that we take significant advantage of, but this is not the case
at the moment.

I definitely want to avoid unnecessary dependencies for pyemma. There's
no headache for the anaconda install but with pip there are tons of
possible problems and every dependency adds to them.

Am 07/07/15 um 01:10 schrieb Martin K. Scherer:

It also uses KMeans for initializing the means during EM. So we would
need to extract this stuff too.
The downside of extracting it is also we do not easily receive
updates/bugfixes for that code. For those two reason I would dis-advise
extracting.


Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@franknoe
Copy link
Contributor Author

franknoe commented Jul 6, 2015

We can instead include Moritz' code. It's faster anyway.
In principle I think the initialization step is sufficient.

Am 07/07/15 um 01:10 schrieb Martin K. Scherer:

It also uses KMeans for initializing the means during EM. So we would
need to extract this stuff too.
The downside of extracting it is also we do not easily receive
updates/bugfixes for that code. For those two reason I would dis-advise
extracting.


Reply to this email directly or view it on GitHub
bhmm/bhmm#42 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants