Skip to content
This repository has been archived by the owner on Oct 14, 2018. It is now read-only.

Incompatibility With Keras Scikit-Learn Wrapper #69

Closed
AlexSchuy opened this issue Jan 6, 2018 · 6 comments
Closed

Incompatibility With Keras Scikit-Learn Wrapper #69

AlexSchuy opened this issue Jan 6, 2018 · 6 comments

Comments

@AlexSchuy
Copy link

The Keras neural-network package has a sklearn wrapper that works with the sklearn RandomizedSearchCV and GridSearchCV classes. However, it fails with the dask-searchcv equivalents. Thus, there seem to be additional requirements beyond the sklearn estimator interface that must be met in order for dask-searchcv to work. Would it be possible to list these, such that other projects could be adapted to be used with dask-searchcv?

@mrocklin
Copy link
Member

mrocklin commented Jan 6, 2018 via email

@AlexSchuy
Copy link
Author

AlexSchuy commented Jan 6, 2018

The following code using dask_searchcv.GridSearchCV crashes, but if you comment-out the dask_searchcv import and uncomment the sklearn import, it runs.

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from dask_searchcv import GridSearchCV
#from sklearn.model_selection import GridSearchCV

def simple_nn(hidden_neurons):
  model = Sequential()
  model.add(Dense(hidden_neurons, activation='relu', input_dim=30))
  model.add(Dense(1, activation='sigmoid'))
  model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
  return model

param_grid = {'hidden_neurons': [100, 200, 300]}
cv = GridSearchCV(KerasClassifier(build_fn=simple_nn), param_grid)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
cv.fit(X_train, y_train)
score = cv.score(X_test, y_test)
print('score = {} on train set with params={}.'.format(score, cv.best_params_))

@mrocklin
Copy link
Member

mrocklin commented Jan 6, 2018 via email

@AlexSchuy
Copy link
Author

File "kerasexample.py", line 21, in
cv.fit(X_train, y_train)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask_searchcv/model_selection.py", line 867, in fit
out = scheduler(dsk, keys, num_workers=n_jobs)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/threaded.py", line 75, in get
pack_exception=pack_exception, **kwargs)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/local.py", line 521, in get_async
raise_exception(exc, tb)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/local.py", line 290, in execute_task
result = _execute_task(task, data)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/local.py", line 271, in _execute_task
return func(*args2)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask_searchcv/methods.py", line 280, in fit_and_score
fields, params, fit_params)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask_searchcv/methods.py", line 216, in fit
est.fit(X, y, **fit_params)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 203, in fit
return super(KerasClassifier, self).fit(x, y, **kwargs)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 147, in fit
history = self.model.fit(x, y, **fit_args)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/models.py", line 960, in fit
validation_steps=validation_steps)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1657, in fit
validation_steps=validation_steps)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1213, in _fit_loop
outs = f(ins_batch)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2357, in call
**self.session_kwargs)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/phys/users/schuya/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1067, in _run
+ e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("dense_1_input:0", shape=(?, 30), dtype=float32) is not an element of this graph.

@mrocklin
Copy link
Member

mrocklin commented Jan 6, 2018

I believe that TensorFlow may have an issue where it doesn't like running in multiple Python threads. cc @bnaul who has dealt with this before. You might also want to do a web search on TensorFlow, Python, and Threads.

@mrocklin
Copy link
Member

mrocklin commented Jan 6, 2018

To answer your original question. To use Dask with the multi-threading scheduler your code should be able to be run in multiple threads (most code is, just not TensorFlow). To use Dask with the multiprocessing or distributed schedulers your code should be able to be serialized (most code is). You can always use Dask with the single-threaded scheduler if you want to see how things work out. To try this run the following line:

dask.set_options(get=dask.local.get_sync)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants