-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of How to Construct 1D Convolutional Net on Text #233
Comments
You actually should probably use a 2D convolution, depending on what you're trying to do. If you have word vectors of size model.add(Convolution2D(nb_feature_maps, 1, n_gram, wv_sz))
model.add(MaxPooling2D(poolsize=(nb_tokens - n_gram + 1, 1)))
model.add(Flatten())
model.add(WhateverLayer(nb_feature_maps, nb_outputs))
model.add(...) If you want to add diversity, you can also do something like a ngram_filters = [3, 4, 5, 6, 7, 8]
conv_filters = []
for n_gram in ngram_filters:
conv_filters.append(Sequential())
conv_filters[-1].add(Convolution2D(nb_feature_maps, 1, n_gram, wv_sz))
conv_filters[-1].add(MaxPooling2D(poolsize=(nb_tokens - n_gram + 1, 1)))
conv_filters[-1].add(Flatten())
model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(ngram_filters), 1))
model.add(Activation('sigmoid')) |
2D convolution only really makes sense under the assumption that the input is spatially continuous over both dimensions (like pictures). A sentence is continuous over time, but not over the wv_sz dimension (unless you are using a kind of word/character embedding that is dense and continuous). Thinking about it, 2Dconv while using a kind of dense and continuous character embedding sounds like the best way to process text. But anyway, if your character embedding is sparse and arbitrary, 1Dconv makes sense. |
Thanks. @fchollet could you provide a quick example of how to do the 1D convolution on some of your textual data? |
@simonhughes22 It's not something I've done before, but I can look into it (I'm a bit busy, so no promises). Is there a paper in particular that you are trying to reproduce? |
Sure that would be awesome. Reproducing this excellent Zhang and LeCun paper would be great: http://arxiv.org/abs/1502.01710 |
or doing the same with words if characters are too slow. My dataset is not large, but an LSTM vastly out-performed more vanilla classification methods, and I am hoping using a convolutional network on the same task would be an interesting comparison, and may work well. |
I believe they are using 2D convolutions, and interestingly they are doing it over a sparse discontinuous character embedding. You can do better. Here is my suggestion: 2D convolutions over a continuous dense character embedding space learned jointly with the main task. Makes much more sense than the braille-like input Zhang and LeCun are using. # input: 2D tensor of integer indices of characters (eg. 1-57).
# input tensor has shape (samples, maxlen)
model = Sequential()
model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
model.add(Reshape(1, maxlen, 256)) # reshape into 4D tensor (samples, 1, maxlen, 256)
# VGG-like convolution stack
model.add(Convolution2D(32, 3, 3, 3, border_mode='full'))
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
# then finish up with Dense layers Warning: untested, etc. |
Awesome thanks I will try it out |
My 2D example with word/character vectors was geared towards the continuous embeddings -- in the style of this paper if anyone's interested. |
Cool thanks @lukedeo |
@lukedeo cool, makes sense! |
@lukedeo I'd like to try your example of different sized filters but can't figure out the correct output dimension size, you example above doesn't seem to work. My input dimensions are (number of sequences, 1, max number of tokens (padded to max len), word vector size). I am using a one hot encoded vector to make it simpler, but that shouldn't matter. I get the following error, using the model described by you above: results = model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=epochs, validation_split=0.0, show_accuracy=True, verbose=1) Process finished with exit code 1 Did you do anything else to transform the model or input? 16 is my mini-batch size I believe. |
@simonhughes22 That error suggests one of the layers is getting a 2 dimensional input where it was expecting 4-D. Convolution layers expect 4-D input so you may need to reshape something somewhere. An example of using a CNN for text classification is below. Note that I have already concatenated the embeddings of the words as a preprocessing step. In particular:
# Convolution layers expect a 4-D input so we reshape our 2-D input
nb_samples = X_train.shape[0]
nb_features = X_train.shape[1]
newshape = (nb_samples, 1, nb_features, 1)
X_train = np.reshape(X_train, newshape).astype(theano.config.floatX)
# We set some hyperparameters
BATCH_SIZE = 16
FIELD_SIZE = 5 * 300
STRIDE = 300
N_FILTERS = 200
# We fit the model
model = Sequential()
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE,
nb_col=1, subsample=(STRIDE, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(N_FILTERS, nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta')
print 'fitting model'
model.fit(X_train, Y_train, nb_epoch=10, batch_size=BATCH_SIZE, verbose=1,
show_accuracy=True, validation_split=0.1) |
@ameasure it was actually a theano bug with the concatenation feature as @fchollet kindly pointed out under a different issue. I got bleeding edge theano and it solved that issue. Thank you for your example. I got @lukedeo's example to work with the newer theano, that's actually very powerful. @ameasure wouldn't you be better seeding the embedding layer with your pre-trained vectors and allowing it to fine tune them to your task? This is how I ended up constructing my model: nb_feature_maps = 32
embedding_size = 64
#ngram_filters = [3, 5, 7, 9]
ngram_filters = [2, 4, 6, 8]
conv_filters = []
for n_gram in ngram_filters:
sequential = Sequential()
conv_filters.append(sequential)
sequential.add(Embedding(max_features, embedding_size))
sequential.add(Reshape(1, maxlen, embedding_size))
sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
sequential.add(Activation("relu"))
sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
sequential.add(Flatten())
model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(conv_filters), 1))
model.add(Activation("sigmoid")) I also have a model that uses a GRU and JZS1 (seems to work better) and an embedding layer, that gives comparable performance. I did try merging a recurrent network and a CNN like above but I got an error so it doesn't seem to like that. I'd be interested if anyone manages to figure out how to do that. |
@simonhughes22 glad you got it working and thanks for sharing your code. I hope to try out @lukedeo''s approach soon but I want to figure out how to modify it to fine tune pre-trained word vectors first. Kim's work (http://arxiv.org/pdf/1408.5882v2.pdf) suggests fine tuning an existing embedding is better than starting from scratch. One thing that doesn't make sense to me however is convolving along the embedding dimension. That's not what other researchers have done and I can't think of any reason why the embedding vectors would be spatially related. Have you compared their approach to a purely 1 dimensional convolution along only the n_grams? |
@ameasure it's only a 2D convolution as the vectors as stacked vertically and you're convolving across the full depth, so it's just doing a convolution over multiple entire word vectors. Convolving parts of vectors wouldn't make sense, as the ordering of the vector elements is random. @lukedeo references a paper above if you want to know more about it. So it's just taking convolutions of different sizes for different ngrams and concatenating them together into one beastie of a model. For my data a single convolution using embeddings (which are essentially an additional convolution over words) works as well so far, as does a GRU for the most part, although its performance is less predictable. At some point i'll try using the glove or word2vec vectors as seeds for the embedding layer. What might be a good idea is to take pre-trained vectors as inputs and merge that with randomly seeded vectors, and prevent the pre-trained vectors from being updated by feeding them in directly as inputs. I think that's doable with the merge layer, as you essentially feed it two datasets, one could be vectors the other could be ids as input to an embedding layer. They do that in the some of the stanford papers building off the Glove work where they have pretrained vectors with an extra part of the vector that is updateable and randomly seeded, and supposedly that worked better. |
@simonhughes22 @fchollet @lukedeo I'm trying to implement the CNN text classifiers with embeddings and multiple convolution sizes suggested in your posts above but I keep getting the following error:
My code is here and I'm using the bleeding edge versions of Theano and Keras. Any idea what's causing this strange error? |
See my notes on your Gist. The input format for the xs is incorrect. It expects a separate 2D array for each Merged sequence model, concatenated into a list. |
Also, think about how your embeddings are working @ameasure. Right now, you have an embedding for every n-gram filter size, meaning no shared word vectors. I'd recommend using the ngram_filters = [2, 3, 4]
graph = Graph()
graph.add_input(name='data')
graph.add_node(Embedding(vocab_size + 1, embedding_size),
name='embedding', input='data')
for n_gram in ngram_filters:
sequential = containers.Sequential()
sequential.add(Reshape(1, maxlen, embedding_size))
sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
sequential.add(Activation('relu'))
sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
sequential.add(Flatten())
graph.add_node(sequential, name = 'unit_' + n_gram)
graph.add_node(Dropout(0.5), name='dropout', inputs=['unit_' + n for n in ngram_filters])
fc = containers.Sequential()
fc.add(Dense(nb_feature_maps * len(ngram_filters), 15))
fc.add(Activation('sigmoid'))
fc.add(Dense(15, nb_classes))
fc.add(Activation('softmax'))
graph.add_node(fc, name='fully_connected', input='dropout')
graph.add_output(name='output', input='fully_connected') |
@lukedeo @simonhughes22 @fchollet thank you! That fixed the issue and corrected my understanding of what's going on. I am now getting absolutely fantastic results on my dataset by the way, thank you for the help and the wonderful library! |
@ameasure glad to hear it! |
hey guys, great library @fchollet So i have created a Feed forward neural network using Keras, think i can call it a deep nn, i have 3 hidden layers. sorry if my lingo is off. I am not hitting the accuracy i want on my test set so i wanna see if CNN can help. I have text that i send my nn, i send it a 100 characters (i normalize the characters by dividing their ascii integer value by 255), it does its work and comes out with an output of 20. I have a classification problem. So @ameasure i am looking at your code and i am little confused. what would field size be in my case, or what is it in your case and how did you get the numbers. What about stride? @fchollet I would think that using a 1d CNN makes sense for text (since in my case it would be 100x1 sized array), you guys talk about 2d being optimal, why is that? looking at the api for 1d: Convolution1D(input_dim, nb_filter, filter_length, ...) looking at the api for 2d: Convolution2D(nb_filter, stack_size, nb_row, nb_col, ...) I have 50 samples to train on and 20 to test on Also there is a subsample_length (1d) and subsample (2d) in the cnn layers, i have read that subsampling is similar to pooling. If i added the subsample option to my cnn layer would i skip pooling? Sorry about all the questions, i've spent hours looking for examples and trying to understand CNN's. I have a good concept of what they are, whats confusing me is the parameters and what i need to set them at. There is also the 1d vs 2d.
thanks for any help :) |
@manwinder123 Take a look at the notes from the Stanford CNN class here: http://cs231n.github.io/convolutional-networks/, it will introduce you to the lingo. The receptive field is the size of the input sequences we're going to feed through our filters in the convolutional layer. In my case it's 5 * 300 because each of my words has been replaced with a 300 dimension vector, and I want the filters to be applied to every contiguous 5 word sequence in my input. Presumably these filters learn to identify 5 word sequences that are useful for my classification task. Stride is how far we shift the filters after each application to the input. A stride of 300 means we shift the filter over by one full word vector before applying it to the input again. Regarding the 1d vs. 2d convolutions, it turns out it's all the same. The important thing is that you're shifting your filters across your input in a reasonable manner. Performance is not the same however. When I adopted the approach used by @fchollet and @simonhughes22 and @lukedeo which basically converts a 1d convolution into a 2d convolution I got huge performance improvements. Presumably the underlying implementation is optimized for 2 dimensional convolutions. |
@manwinder123 I think you're making a mistake taking the ascii values as the inputs. What that is doing is taking things that are discrete, characters, and converting it to a continous quantity. This is implying that adjacent letters in the alphabet have very similar meanings, and letters further apart do not. Instead, what you want to do is either use an embedding layer (pass it a list of id's, one per character, counting from 1 upwards (if 0 padded, else start at 0), with no gaps in the ids), or a one-hot encoding (a 255 element vector of zeros, with a one in the index of the ascii value). Secondly, I would stick with a stride of 1,1, and do a convolution over the characters. In fact I'd go one further and recommend you use words and not characters with the encoding method as described. Assign a unique id to each word, replace the words with ids, do some zero padding and pass to an embedding layer and then a convolutional layer as in the code examples above. Once that's working, you can experiment with merging convolutions of different sizes. The idea then is to having 2D convolutions over the embeddings, where the word embeddings are stacked vertically, and so you are setting nb_rows = the emdedding size, i.e. the matrix height, and the nb_cols to the number of words you want to convolve over. Hope that makes sense. I guarantee that will work much better. I've also had as good results using the GRU and LSTM layers combined with embedding layers, although those run much slower as theano's scan function (which they rely on) is quite slow. |
@ameasure Thanks for the link, it clear up a lot of the questions i had @simonhughes22 thanks for the tip, i'll try it out. hopefully everything goes well :) appreciate the support 💃 |
See great example of Convolutional1D applied to text classification from fchollet now in keras: https://github.com/fchollet/keras/blob/master/examples/imdb_cnn.py |
Thanks for the link. So I tried using 1d but it was extremely slow. My input dim size was the size of my input. So if my file had 5000 characters, I set input dim to 5000. I had set the nb of filters to 1 (I thought it would help speed it up). But it was way too slow, this was a few weeks ago. I haven't tried recently though |
5000 characters is way too much to train an LSTM or some form of recurrent model. They can learn long distance relationship, but not that long. I'd either switch to a word model (although that's still likely too large), or use a sliding window approach of some sort. |
I know I'm digging up old code here, but I'm really inrigued by this approach. However, I get errors when trying to use the structure suggested by @fchollet: max_features = 1000
maxlen = 10
model = Sequential()
model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
model.add(Reshape((1, maxlen, 256))) # reshape into 4D tensor (samples, 1, maxlen, 256)
# VGG-like convolution stack
model.add(Convolution2D(32, 3, 3, 3, border_mode='full'))
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1)) Returns: In [164]: model = Sequential()
In [165]: model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
In [166]: model.add(Reshape((1, maxlen, 256))) # reshape into 4D tensor (samples, 1, maxlen, 256)
In [167]: # VGG-like convolution stack
In [168]: model.add(Convolution2D(32, 3, 3, 3))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims)
2481 try:
-> 2482 prod = a.prod
2483 except AttributeError:
AttributeError: 'tuple' object has no attribute 'prod'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-168-9b06d1cea8a5> in <module>()
----> 1 model.add(Convolution2D(32, 3, 3, 3))
/usr/local/lib/python3.5/site-packages/keras/layers/containers.py in add(self, layer)
68 self.layers.append(layer)
69 if len(self.layers) > 1:
---> 70 self.layers[-1].set_previous(self.layers[-2])
71 if not hasattr(self.layers[0], 'input'):
72 self.set_input()
/usr/local/lib/python3.5/site-packages/keras/layers/core.py in set_previous(self, layer)
96 assert self.nb_input == layer.nb_output == 1, 'Cannot connect layers: input count and output count should be 1.'
97 if hasattr(self, 'input_ndim'):
---> 98 assert self.input_ndim == len(layer.output_shape), ('Incompatible shapes: layer expected input with ndim=' +
99 str(self.input_ndim) +
100 ' but previous layer has output_shape ' +
/usr/local/lib/python3.5/site-packages/keras/layers/core.py in output_shape(self)
764 @property
765 def output_shape(self):
--> 766 return (self.input_shape[0],) + self._fix_unknown_dimension(self.input_shape[1:], self.dims)
767
768 def get_output(self, train=False):
/usr/local/lib/python3.5/site-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape)
752 known *= dim
753
--> 754 original = np.prod(input_shape, dtype=int)
755 if unknown is not None:
756 if known == 0 or original % known != 0:
/usr/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims)
2483 except AttributeError:
2484 return _methods._prod(a, axis=axis, dtype=dtype,
-> 2485 out=out, keepdims=keepdims)
2486 return prod(axis=axis, dtype=dtype, out=out)
2487 else:
/usr/local/lib/python3.5/site-packages/numpy/core/_methods.py in _prod(a, axis, dtype, out, keepdims)
33
34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
---> 35 return umr_prod(a, axis, dtype, out, keepdims)
36
37 def _any(a, axis=None, dtype=None, out=None, keepdims=False):
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' What am I doing wrong? (Note that I added another pair of parenthesis to the reshape layer, and took out the |
Yeah, is there any examples of using CNN with simple 1d inputs? ( |
I put together an example using one hot inputs here: https://gist.github.com/ameasure/985c87bb8b34ac30269f One hot text inputs work surprisingly well, especially for LSTM's. |
@simonhughes22 hi, as the code below:
Was your input shape (to Embedding layer) still (nb_samples, 1, max_len, vector_size) ? |
@eshijia I think the input shape is
|
Any chance of updating this for current 1D convolutions and API? Thanks! |
I have data set which is of size 393021 rows with 41 features and are classified in to 23 classes. I have used keras imdb_cnn.py example for my data set but i am able to get only 52 percentage accuracy. Could you please tell us regarding how to increase the accuracy for my data set |
I have a similar case - validation accuracy remains at about the baseline On Mon, Jul 25, 2016 at 1:55 PM, vinayakumarr notifications@github.com
Dan Ofer - דן עופר Photography |
Hi all, I have an issue with basically the same task: minibatches of fixed-length sequences of 1-hot's --> sequences of embeddings --> 1-d convolution (chopping 3-grams)
and I get this error with the
Could you please tell me what is the problem and how do I control the shape of each layer in this case? UPD: turns out, Embedding layer works fine with just one-hot indices, and I passed into it 1-hot vectors which messed up the shapes. |
hi @simonhughes22, could you try to explain how does you model change if one wants to use documents as input, represented as a matrix of sentence matrix, each sentence matrix would be the words vector stacked as you pointed out. One training sample would be in this case a 3d vector. |
hi all, lstm_1 = LSTM(256, return_sequences=False)(l_pool4) output size none, 280, 256l_decoder_1 = LSTM(256, return_sequences=True)(l_in_rep) output size none, 280, 256fc_layer_1 = Dense(68,activation='relu')(l_decoder_2) |
I'd like to use keras to build a 1D convolutional net with pooling layers on some textual input, but I can't figure out the right input format and the right number of incoming connections above the flatten layer. Would you be able to provide a simple example using one of the data sets?
Awesome work by the way, great library.
The text was updated successfully, but these errors were encountered: