-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to implement the non-static CNN in (Kim, 2014) using Keras #1515
Comments
hmm, doesn't a |
Hello @Imorton-zd , I implemented the (Kim, 2014) model recently. And here is my code. graph = Graph()
graph.add_input(name = 'input', input_shape=(config.sent_len,), dtype='int')
graph.add_node(
Embedding(config.vocab_size, config.vec_dim, input_length=config.sent_len, weights = [self._load_embedding()]),
name = 'nonstatic_emb', input = 'input'
)
conv_layer_outputs = []
for idx, window_size in enumerate(config.conv_filter_hs):
conv_name = 'conv_nonstatic_%d' % idx
pool_name = 'pool_nonstatic_%d' % idx
graph.add_node(
Convolution1D(config.conv_features, window_size, activation='relu',W_constraint=MaxNorm(3), b_constraint=MaxNorm(3)),
name=conv_name, input = 'nonstatic_emb'
)
graph.add_node(MaxPooling1D(pool_length = config.sent_len - window_size + 1), name=pool_name, input = conv_name)
conv_layer_outputs.append(pool_name)
#MLP
print conv_layer_outputs
graph.add_node(
Reshape((config.conv_features * len(config.conv_filter_hs),)),
inputs = conv_layer_outputs,
name = 'reshape')
for idx, mlp_h_dim in enumerate(config.mlp_hidden_units):
print mlp_h_dim
graph.add_node(
Dense(
mlp_h_dim,
activation='linear' if idx != len(config.mlp_hidden_units)-1 else 'softmax',
W_constraint = MaxNorm(3),
b_constraint = MaxNorm(3)
),
name = 'fc_%d' % idx,
input= 'reshape' if idx == 0 else 'fc_%d_dropout' %(idx-1)
)
if idx != len(config.mlp_hidden_units) - 1:
graph.add_node(
Dropout(config.dropout_rate),
name = 'fc_%d_dropout' % idx,
input = 'fc_%d' % idx
)
graph.add_output(name='output',input='fc_%d' % (len(config.mlp_hidden_units)-1)) Hope it helps. |
@chen070757 Thanks for your reply and sharing your code! But I have some questions to the code. First, what are the parameters setting in |
@ymcui Thanks for your reply!As far as I'm concerned, the first input of the non-static cnn is the word embeddings pre-trained in word2vec, then the embbedings are always updated via back propagation. However, in keras, either using embedding layer which cannot do semi-supevised learning or directly putting the pre-trained embeddings into the networks as static input, it does not achieve the idea of non-static. |
config is a self-defined class that contains some constants of my network.
|
@chen070757 Thank you very much. I still have some questions, hope not disturbing you.
|
Keras API changed recently. My code is based on Keras 0.3.2, and it is not compatible with the latest keras.
For further information about
|
@chen070757 Many thanks! The last two questions: |
Validation data is used to help to estimate whether model is underfitting or overfitting. It has no effect with training process.
|
@chen070757 Did you get (almost) same performance on any dataset compared to the original implementation or the torch implementation. I've tried to implement Kim's CNN (see #1994, though it used the old graph api) but failed to achieve 81% accuracy on MR (movie review) data set. I got about 79%~80%. This bothers me for a long time. I don't know whether the difference comes from some minor difference in the model architecture or details of experiment setting. It would be very helpful if you could provide the performance information, |
@chen070757 Thanks a lot for your great comments. I have one more question. Is there any way to return back-propagated word vectors? Or even more general question, is it possible to just back-propagate word embeddings using labeled data set? I need to observe changes in the word vectors and similarities. Regards, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
In the paper (Kim, 2014), the non-static and static CNN were proposed. Has someone implemented the methods. I will appreciate it vary much if you share the codes with Keras.
The text was updated successfully, but these errors were encountered: