Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of How to Construct 1D Convolutional Net on Text #233

Closed
simonhughes22 opened this issue Jun 16, 2015 · 40 comments
Closed

Example of How to Construct 1D Convolutional Net on Text #233

simonhughes22 opened this issue Jun 16, 2015 · 40 comments

Comments

@simonhughes22
Copy link

I'd like to use keras to build a 1D convolutional net with pooling layers on some textual input, but I can't figure out the right input format and the right number of incoming connections above the flatten layer. Would you be able to provide a simple example using one of the data sets?

Awesome work by the way, great library.

@lukedeo
Copy link
Contributor

lukedeo commented Jun 16, 2015

You actually should probably use a 2D convolution, depending on what you're trying to do. If you have word vectors of size wv_sz, and you truncate/pad each sentence to have nb_tokens tokens, you can form a "sentence image" of size (nb_tokens, wv_sz). You can then choose 1 or more n-gram sizes to use as a filter. As long as you make sure your X has shape (nb_examples, 1, nb_tokens, wv_sz), you can use something like

model.add(Convolution2D(nb_feature_maps, 1, n_gram, wv_sz))
model.add(MaxPooling2D(poolsize=(nb_tokens - n_gram + 1, 1)))
model.add(Flatten())
model.add(WhateverLayer(nb_feature_maps, nb_outputs))
model.add(...)

If you want to add diversity, you can also do something like a Merge on a list conv-pool sub-models. Here is an example of something I've successfully used on a sentiment classifier:

ngram_filters = [3, 4, 5, 6, 7, 8]
conv_filters = []

for n_gram in ngram_filters:
    conv_filters.append(Sequential())
    conv_filters[-1].add(Convolution2D(nb_feature_maps, 1, n_gram, wv_sz))
    conv_filters[-1].add(MaxPooling2D(poolsize=(nb_tokens - n_gram + 1, 1)))
    conv_filters[-1].add(Flatten())

model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(ngram_filters), 1))
model.add(Activation('sigmoid'))

@fchollet
Copy link
Collaborator

2D convolution only really makes sense under the assumption that the input is spatially continuous over both dimensions (like pictures). A sentence is continuous over time, but not over the wv_sz dimension (unless you are using a kind of word/character embedding that is dense and continuous).

Thinking about it, 2Dconv while using a kind of dense and continuous character embedding sounds like the best way to process text. But anyway, if your character embedding is sparse and arbitrary, 1Dconv makes sense.

@simonhughes22
Copy link
Author

Thanks. @fchollet could you provide a quick example of how to do the 1D convolution on some of your textual data?

@fchollet
Copy link
Collaborator

@simonhughes22 It's not something I've done before, but I can look into it (I'm a bit busy, so no promises). Is there a paper in particular that you are trying to reproduce?

@simonhughes22
Copy link
Author

Sure that would be awesome. Reproducing this excellent Zhang and LeCun paper would be great: http://arxiv.org/abs/1502.01710

@simonhughes22
Copy link
Author

or doing the same with words if characters are too slow. My dataset is not large, but an LSTM vastly out-performed more vanilla classification methods, and I am hoping using a convolutional network on the same task would be an interesting comparison, and may work well.

@fchollet
Copy link
Collaborator

I believe they are using 2D convolutions, and interestingly they are doing it over a sparse discontinuous character embedding. You can do better.

Here is my suggestion: 2D convolutions over a continuous dense character embedding space learned jointly with the main task. Makes much more sense than the braille-like input Zhang and LeCun are using.

# input: 2D tensor of integer indices of characters (eg. 1-57). 
# input tensor has shape (samples, maxlen)
model = Sequential()
model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
model.add(Reshape(1, maxlen, 256)) # reshape into 4D tensor (samples, 1, maxlen, 256)
# VGG-like convolution stack
model.add(Convolution2D(32, 3, 3, 3, border_mode='full')) 
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
# then finish up with Dense layers

Warning: untested, etc.

@simonhughes22
Copy link
Author

Awesome thanks I will try it out

@lukedeo
Copy link
Contributor

lukedeo commented Jun 16, 2015

My 2D example with word/character vectors was geared towards the continuous embeddings -- in the style of this paper if anyone's interested.

@simonhughes22
Copy link
Author

Cool thanks @lukedeo

@fchollet
Copy link
Collaborator

@lukedeo cool, makes sense!

@simonhughes22
Copy link
Author

@lukedeo I'd like to try your example of different sized filters but can't figure out the correct output dimension size, you example above doesn't seem to work. My input dimensions are

(number of sequences, 1, max number of tokens (padded to max len), word vector size). I am using a one hot encoded vector to make it simpler, but that shouldn't matter. I get the following error, using the model described by you above:

results = model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=epochs, validation_split=0.0, show_accuracy=True, verbose=1)
File "build/bdist.macosx-10.6-x86_64/egg/keras/models.py", line 204, in fit
File "/Users/simon.hughes/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/compile/function_module.py", line 513, in call
allow_downcast=s.allow_downcast)
File "/Users/simon.hughes/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/tensor/type.py", line 169, in filter
data.shape))
TypeError: ('Bad input argument to theano function with name "build/bdist.macosx-10.6-x86_64/egg/keras/models.py:104" at index 1(0-based)', 'Wrong number of dimensions: expected 4, got 2 with shape (16, 1).')

Process finished with exit code 1

Did you do anything else to transform the model or input? 16 is my mini-batch size I believe.

@ameasure
Copy link
Contributor

ameasure commented Jul 1, 2015

@simonhughes22 That error suggests one of the layers is getting a 2 dimensional input where it was expecting 4-D. Convolution layers expect 4-D input so you may need to reshape something somewhere.

An example of using a CNN for text classification is below. Note that I have already concatenated the embeddings of the words as a preprocessing step. In particular:

  • For each text input I break it up into it's constituent words, retrieve pretrained 300-dimensional word embeddings for these, and then concatenate them all, zero padding as needed to get a vector of a fixed length (30000).
  • I stack these into a (nb_samples, 30000) array which I call X_train
# Convolution layers expect a 4-D input so we reshape our 2-D input
nb_samples = X_train.shape[0]
nb_features = X_train.shape[1]
newshape = (nb_samples, 1, nb_features, 1)
X_train = np.reshape(X_train, newshape).astype(theano.config.floatX)

# We set some hyperparameters
BATCH_SIZE = 16
FIELD_SIZE = 5 * 300
STRIDE = 300
N_FILTERS = 200

# We fit the model
model = Sequential()
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE, 
                        nb_col=1, subsample=(STRIDE, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(N_FILTERS, nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adadelta')
print 'fitting model'
model.fit(X_train, Y_train, nb_epoch=10, batch_size=BATCH_SIZE, verbose=1, 
          show_accuracy=True, validation_split=0.1)

@simonhughes22
Copy link
Author

@ameasure it was actually a theano bug with the concatenation feature as @fchollet kindly pointed out under a different issue. I got bleeding edge theano and it solved that issue. Thank you for your example. I got @lukedeo's example to work with the newer theano, that's actually very powerful. @ameasure wouldn't you be better seeding the embedding layer with your pre-trained vectors and allowing it to fine tune them to your task? This is how I ended up constructing my model:

nb_feature_maps = 32
embedding_size = 64

#ngram_filters = [3, 5, 7, 9]
ngram_filters = [2, 4, 6, 8]
conv_filters = []

for n_gram in ngram_filters:
    sequential = Sequential()
    conv_filters.append(sequential)

    sequential.add(Embedding(max_features, embedding_size))
    sequential.add(Reshape(1, maxlen, embedding_size))
    sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
    sequential.add(Activation("relu"))
    sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
    sequential.add(Flatten())

model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(conv_filters), 1))
model.add(Activation("sigmoid"))

I also have a model that uses a GRU and JZS1 (seems to work better) and an embedding layer, that gives comparable performance. I did try merging a recurrent network and a CNN like above but I got an error so it doesn't seem to like that. I'd be interested if anyone manages to figure out how to do that.

@ameasure
Copy link
Contributor

ameasure commented Jul 2, 2015

@simonhughes22 glad you got it working and thanks for sharing your code. I hope to try out @lukedeo''s approach soon but I want to figure out how to modify it to fine tune pre-trained word vectors first. Kim's work (http://arxiv.org/pdf/1408.5882v2.pdf) suggests fine tuning an existing embedding is better than starting from scratch.

One thing that doesn't make sense to me however is convolving along the embedding dimension. That's not what other researchers have done and I can't think of any reason why the embedding vectors would be spatially related. Have you compared their approach to a purely 1 dimensional convolution along only the n_grams?

@simonhughes22
Copy link
Author

@ameasure it's only a 2D convolution as the vectors as stacked vertically and you're convolving across the full depth, so it's just doing a convolution over multiple entire word vectors. Convolving parts of vectors wouldn't make sense, as the ordering of the vector elements is random. @lukedeo references a paper above if you want to know more about it. So it's just taking convolutions of different sizes for different ngrams and concatenating them together into one beastie of a model. For my data a single convolution using embeddings (which are essentially an additional convolution over words) works as well so far, as does a GRU for the most part, although its performance is less predictable.

At some point i'll try using the glove or word2vec vectors as seeds for the embedding layer. What might be a good idea is to take pre-trained vectors as inputs and merge that with randomly seeded vectors, and prevent the pre-trained vectors from being updated by feeding them in directly as inputs. I think that's doable with the merge layer, as you essentially feed it two datasets, one could be vectors the other could be ids as input to an embedding layer. They do that in the some of the stanford papers building off the Glove work where they have pretrained vectors with an extra part of the vector that is updateable and randomly seeded, and supposedly that worked better.

@ameasure
Copy link
Contributor

ameasure commented Jul 8, 2015

@simonhughes22 @fchollet @lukedeo I'm trying to implement the CNN text classifiers with embeddings and multiple convolution sizes suggested in your posts above but I keep getting the following error:

Loading data...
8982 train sequences
2246 test sequences
46 classes
X_train shape: (8982L, 50L)
X_test shape: (2246L, 50L)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
Y_train shape: (8982L, 46L)
Y_test shape: (2246L, 46L)
Train on 8083 samples, validate on 899 samples
Epoch 0
Traceback (most recent call last):

  File "<ipython-input-11-60c586a4ee9a>", line 1, in <module>
    runfile('C:/Users/ameasure/Desktop/Programming Projects/cnn/reuters_multi_cnn.py', wdir='C:/Users/ameasure/Desktop/Programming Projects/cnn')

  File "C:\Users\ameasure\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
    execfile(filename, namespace)

  File "C:\Users\ameasure\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/ameasure/Desktop/Programming Projects/cnn/reuters_multi_cnn.py", line 67, in <module>
    model.fit(X=X_train, y=Y_train, batch_size=batch_size, nb_epoch=200, verbose=1, show_accuracy=True, validation_split=0.1)

  File "build\bdist.win-amd64\egg\keras\models.py", line 371, in fit
    validation_split=validation_split, val_f=val_f, val_ins=val_ins, shuffle=shuffle, metrics=metrics)

  File "build\bdist.win-amd64\egg\keras\models.py", line 135, in _fit
    outs = f(*ins_batch)

  File "C:\Users\ameasure\Anaconda\lib\site-packages\theano-0.7.0-py2.7.egg\theano\compile\function_module.py", line 593, in __call__
    self.inv_finder[c]))

TypeError: Missing required input: y

My code is here and I'm using the bleeding edge versions of Theano and Keras. Any idea what's causing this strange error?

@simonhughes22
Copy link
Author

See my notes on your Gist. The input format for the xs is incorrect. It expects a separate 2D array for each Merged sequence model, concatenated into a list.

@lukedeo
Copy link
Contributor

lukedeo commented Jul 8, 2015

Also, think about how your embeddings are working @ameasure. Right now, you have an embedding for every n-gram filter size, meaning no shared word vectors. I'd recommend using the Graph class and something like this. NOTE this is untested and uncompiled.

ngram_filters = [2, 3, 4]

graph = Graph()

graph.add_input(name='data')

graph.add_node(Embedding(vocab_size + 1, embedding_size), 
               name='embedding', input='data')

for n_gram in ngram_filters:
    sequential = containers.Sequential()
    sequential.add(Reshape(1, maxlen, embedding_size))
    sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
    sequential.add(Activation('relu'))
    sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
    sequential.add(Flatten())

    graph.add_node(sequential, name = 'unit_' + n_gram)

graph.add_node(Dropout(0.5), name='dropout', inputs=['unit_' + n for n in ngram_filters]) 

fc = containers.Sequential()
fc.add(Dense(nb_feature_maps * len(ngram_filters), 15))
fc.add(Activation('sigmoid'))
fc.add(Dense(15, nb_classes))
fc.add(Activation('softmax'))

graph.add_node(fc, name='fully_connected', input='dropout')
graph.add_output(name='output', input='fully_connected')

@ameasure
Copy link
Contributor

ameasure commented Jul 9, 2015

@lukedeo @simonhughes22 @fchollet thank you! That fixed the issue and corrected my understanding of what's going on. I am now getting absolutely fantastic results on my dataset by the way, thank you for the help and the wonderful library!

@lukedeo
Copy link
Contributor

lukedeo commented Jul 9, 2015

@ameasure glad to hear it!

@manwinder123
Copy link

hey guys, great library @fchollet

So i have created a Feed forward neural network using Keras, think i can call it a deep nn, i have 3 hidden layers. sorry if my lingo is off. I am not hitting the accuracy i want on my test set so i wanna see if CNN can help. I have text that i send my nn, i send it a 100 characters (i normalize the characters by dividing their ascii integer value by 255), it does its work and comes out with an output of 20. I have a classification problem.

So @ameasure i am looking at your code and i am little confused. what would field size be in my case, or what is it in your case and how did you get the numbers. What about stride?

@fchollet I would think that using a 1d CNN makes sense for text (since in my case it would be 100x1 sized array), you guys talk about 2d being optimal, why is that?

looking at the api for 1d: Convolution1D(input_dim, nb_filter, filter_length, ...)
I think my settings would be:
*input_dim = 100 (guessing that this is the input size but you say that this is the # of channels, in pictures there are 3, rgb, so would i set this to 1)
*nb_filter = 25 (i'm not really sure about this one, on the Keras website you write "(dimensionality of the output)", i guess this is sorta like how many outputs the cnn layer will have
*filter_length = completely lost on this one

looking at the api for 2d: Convolution2D(nb_filter, stack_size, nb_row, nb_col, ...)
I think my settings would be:
*nb_filter = 25 (see 1d above)
*stack_size = 1 (i've seen 3 in some of your examples and i assume that its 3 because the pictures have a red, green and blue color channel, my input is text so i think it has only 1 channel??)
*nb_row = 1 (my data doesn't have multiple rows, its just one row with 100 columns)
*nb_col = 100 (i have a 100 characters, so i guess thats 100 rows)

I have 50 samples to train on and 20 to test on

Also there is a subsample_length (1d) and subsample (2d) in the cnn layers, i have read that subsampling is similar to pooling. If i added the subsample option to my cnn layer would i skip pooling?

Sorry about all the questions, i've spent hours looking for examples and trying to understand CNN's. I have a good concept of what they are, whats confusing me is the parameters and what i need to set them at. There is also the 1d vs 2d.

# Convolution layers expect a 4-D input so we reshape our 2-D input
nb_samples = X_train.shape[0]
nb_features = X_train.shape[1]
newshape = (nb_samples, 1, nb_features, 1)
X_train = np.reshape(X_train, newshape).astype(theano.config.floatX)

# We set some hyperparameters
BATCH_SIZE = 16
FIELD_SIZE = 5 * 300
STRIDE = 300
N_FILTERS = 200

# We fit the model
model = Sequential()
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE, 
                        nb_col=1, subsample=(STRIDE, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(N_FILTERS, nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adadelta')
print 'fitting model'
model.fit(X_train, Y_train, nb_epoch=10, batch_size=BATCH_SIZE, verbose=1, 
          show_accuracy=True, validation_split=0.1)

thanks for any help :)

@ameasure
Copy link
Contributor

@manwinder123 Take a look at the notes from the Stanford CNN class here: http://cs231n.github.io/convolutional-networks/, it will introduce you to the lingo. The receptive field is the size of the input sequences we're going to feed through our filters in the convolutional layer. In my case it's 5 * 300 because each of my words has been replaced with a 300 dimension vector, and I want the filters to be applied to every contiguous 5 word sequence in my input. Presumably these filters learn to identify 5 word sequences that are useful for my classification task.

Stride is how far we shift the filters after each application to the input. A stride of 300 means we shift the filter over by one full word vector before applying it to the input again.

Regarding the 1d vs. 2d convolutions, it turns out it's all the same. The important thing is that you're shifting your filters across your input in a reasonable manner. Performance is not the same however. When I adopted the approach used by @fchollet and @simonhughes22 and @lukedeo which basically converts a 1d convolution into a 2d convolution I got huge performance improvements. Presumably the underlying implementation is optimized for 2 dimensional convolutions.

@simonhughes22
Copy link
Author

@manwinder123 I think you're making a mistake taking the ascii values as the inputs. What that is doing is taking things that are discrete, characters, and converting it to a continous quantity. This is implying that adjacent letters in the alphabet have very similar meanings, and letters further apart do not. Instead, what you want to do is either use an embedding layer (pass it a list of id's, one per character, counting from 1 upwards (if 0 padded, else start at 0), with no gaps in the ids), or a one-hot encoding (a 255 element vector of zeros, with a one in the index of the ascii value). Secondly, I would stick with a stride of 1,1, and do a convolution over the characters. In fact I'd go one further and recommend you use words and not characters with the encoding method as described. Assign a unique id to each word, replace the words with ids, do some zero padding and pass to an embedding layer and then a convolutional layer as in the code examples above. Once that's working, you can experiment with merging convolutions of different sizes. The idea then is to having 2D convolutions over the embeddings, where the word embeddings are stacked vertically, and so you are setting nb_rows = the emdedding size, i.e. the matrix height, and the nb_cols to the number of words you want to convolve over.

Hope that makes sense. I guarantee that will work much better. I've also had as good results using the GRU and LSTM layers combined with embedding layers, although those run much slower as theano's scan function (which they rely on) is quite slow.

@manwinder123
Copy link

@ameasure Thanks for the link, it clear up a lot of the questions i had

@simonhughes22 thanks for the tip, i'll try it out. hopefully everything goes well :)

appreciate the support 💃

@xpact
Copy link

xpact commented Aug 4, 2015

See great example of Convolutional1D applied to text classification from fchollet now in keras: https://github.com/fchollet/keras/blob/master/examples/imdb_cnn.py

@manwinder123
Copy link

Thanks for the link.

So I tried using 1d but it was extremely slow. My input dim size was the size of my input. So if my file had 5000 characters, I set input dim to 5000. I had set the nb of filters to 1 (I thought it would help speed it up). But it was way too slow, this was a few weeks ago. I haven't tried recently though

@simonhughes22
Copy link
Author

5000 characters is way too much to train an LSTM or some form of recurrent model. They can learn long distance relationship, but not that long. I'd either switch to a word model (although that's still likely too large), or use a sliding window approach of some sort.

@zachmayer
Copy link
Contributor

I know I'm digging up old code here, but I'm really inrigued by this approach. However, I get errors when trying to use the structure suggested by @fchollet:

max_features = 1000
maxlen = 10

model = Sequential()
model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
model.add(Reshape((1, maxlen, 256))) # reshape into 4D tensor (samples, 1, maxlen, 256)
# VGG-like convolution stack
model.add(Convolution2D(32, 3, 3, 3, border_mode='full')) 
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1))

Returns:

In [164]: model = Sequential()

In [165]: model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)

In [166]: model.add(Reshape((1, maxlen, 256))) # reshape into 4D tensor (samples, 1, maxlen, 256)

In [167]: # VGG-like convolution stack

In [168]: model.add(Convolution2D(32, 3, 3, 3)) 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims)
   2481         try:
-> 2482             prod = a.prod
   2483         except AttributeError:

AttributeError: 'tuple' object has no attribute 'prod'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-168-9b06d1cea8a5> in <module>()
----> 1 model.add(Convolution2D(32, 3, 3, 3))

/usr/local/lib/python3.5/site-packages/keras/layers/containers.py in add(self, layer)
     68         self.layers.append(layer)
     69         if len(self.layers) > 1:
---> 70             self.layers[-1].set_previous(self.layers[-2])
     71             if not hasattr(self.layers[0], 'input'):
     72                 self.set_input()

/usr/local/lib/python3.5/site-packages/keras/layers/core.py in set_previous(self, layer)
     96         assert self.nb_input == layer.nb_output == 1, 'Cannot connect layers: input count and output count should be 1.'
     97         if hasattr(self, 'input_ndim'):
---> 98             assert self.input_ndim == len(layer.output_shape), ('Incompatible shapes: layer expected input with ndim=' +
     99                                                                 str(self.input_ndim) +
    100                                                                 ' but previous layer has output_shape ' +

/usr/local/lib/python3.5/site-packages/keras/layers/core.py in output_shape(self)
    764     @property
    765     def output_shape(self):
--> 766         return (self.input_shape[0],) + self._fix_unknown_dimension(self.input_shape[1:], self.dims)
    767 
    768     def get_output(self, train=False):

/usr/local/lib/python3.5/site-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape)
    752                 known *= dim
    753 
--> 754         original = np.prod(input_shape, dtype=int)
    755         if unknown is not None:
    756             if known == 0 or original % known != 0:

/usr/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims)
   2483         except AttributeError:
   2484             return _methods._prod(a, axis=axis, dtype=dtype,
-> 2485                                   out=out, keepdims=keepdims)
   2486         return prod(axis=axis, dtype=dtype, out=out)
   2487     else:

/usr/local/lib/python3.5/site-packages/numpy/core/_methods.py in _prod(a, axis, dtype, out, keepdims)
     33 
     34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
---> 35     return umr_prod(a, axis, dtype, out, keepdims)
     36 
     37 def _any(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

What am I doing wrong?

(Note that I added another pair of parenthesis to the reshape layer, and took out the border_mode='full' in the first convolution)

@wbars
Copy link

wbars commented Mar 25, 2016

Yeah, is there any examples of using CNN with simple 1d inputs? (X_train would be n_samples x n_features matrix i guess) and i want just organize first CNN layer to have n_samples inputs?

@ameasure
Copy link
Contributor

I put together an example using one hot inputs here: https://gist.github.com/ameasure/985c87bb8b34ac30269f

One hot text inputs work surprisingly well, especially for LSTM's.

@eshijia
Copy link

eshijia commented Apr 1, 2016

@simonhughes22 hi, as the code below:

for n_gram in ngram_filters:
    sequential = Sequential()
    conv_filters.append(sequential)

    sequential.add(Embedding(max_features, embedding_size))
    sequential.add(Reshape(1, maxlen, embedding_size))
    sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
    sequential.add(Activation("relu"))
    sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
    sequential.add(Flatten())

model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(conv_filters), 1))
model.add(Activation("sigmoid"))

Was your input shape (to Embedding layer) still (nb_samples, 1, max_len, vector_size) ?

@brianlow
Copy link

@eshijia I think the input shape is (nb_samples, maxlen). Where nb_samples is the number of sentences and maxlen is the number of words per sentence. An example might be:

[ 
  [1, 2, 3],    # Bob eyed Alice
  [3, 4, 1]     # Alice uppercut Bob
]

@ddofer
Copy link

ddofer commented Jul 16, 2016

Any chance of updating this for current 1D convolutions and API?
(I'm running into issues when trying to translate this..)

Thanks!

@vinayakumarr
Copy link

I have data set which is of size 393021 rows with 41 features and are classified in to 23 classes. I have used keras imdb_cnn.py example for my data set but i am able to get only 52 percentage accuracy. Could you please tell us regarding how to increase the accuracy for my data set

@ddofer
Copy link

ddofer commented Jul 25, 2016

I have a similar case - validation accuracy remains at about the baseline
distribution ; despite the use of various optimizers, dropout filter sizes
and +- pooling.

On Mon, Jul 25, 2016 at 1:55 PM, vinayakumarr notifications@github.com
wrote:

I have data set which is of size 393021 rows with 41 features and are
classified in to 23 classes. I have used keras imdb_cnn.py example for my
data set but i am able to get only 52 percentage accuracy. Could you please
tell us regarding how to increase the accuracy for my data set


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#233 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AE4hg4gfT3jgiyBbE7525TeHGGDqVfdAks5qZJYigaJpZM4FEh-z
.

Dan Ofer - דן עופר
Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ

Photography
http://picasaweb.google.com/ddofer
http://500px.com/DanOfer

@ishalyminov
Copy link

ishalyminov commented Aug 26, 2016

Hi all,

I have an issue with basically the same task: minibatches of fixed-length sequences of 1-hot's --> sequences of embeddings --> 1-d convolution (chopping 3-grams)
And I'm using Keras Functional API. Here's what I got:

input_layer = Input(name='input', shape=(sequence_length, one_hot_size))
emb = Embedding(one_hot_size, embedding_size, name='embedding')(input_layer)
conv = Convolution1D(64, 3, name='conv')(emb)

and I get this error with the emb layer:

Exception: Input 0 is incompatible with layer conv: expected ndim=3, found ndim=4

Could you please tell me what is the problem and how do I control the shape of each layer in this case?

UPD: turns out, Embedding layer works fine with just one-hot indices, and I passed into it 1-hot vectors which messed up the shapes.

@rfalba
Copy link

rfalba commented Feb 24, 2017

hi @simonhughes22, could you try to explain how does you model change if one wants to use documents as input, represented as a matrix of sentence matrix, each sentence matrix would be the words vector stacked as you pointed out. One training sample would be in this case a 3d vector.

@dupsys
Copy link

dupsys commented May 11, 2017

hi all,
i am doing my best to implement this Soroush paper(https://arxiv.org/pdf/1607.07514.pdf) as embedding and representation.
Here is the snippet of my code for vectorization:
maxSequenceLength = 1 + max([len(x.split(" ")) for x in captures_text])
inputChars = np.zeros((len(captures_text), maxSequenceLength))
nextChars = np.zeros((len(captures_text), maxSequenceLength))
print('Prepare the dataset for input and output pairs encoding as integers....')
for i in range(0, len(captures_text), 3):
inputChars[i, 0]= char_2_id['S']
try:
nextChars[i, 0]= char_2_id[captures_text[i][0]]
except IndexError:
pass
for j in range(1, maxSequenceLength):
if j < len(captures_text[i]) + 1:
inputChars[i, j] = char_2_id[captures_text[i][j - 1]]
if j < len(captures_text[i]):
nextChars[i, j]= char_2_id[captures_text[i][j]]
else:
nextChars[i, j]= char_2_id['E']
else:
inputChars[i, j]= char_2_id['E']
nextChars[i, j] = char_2_id['E']
and i build the model as:
inputs = Input(shape=(180,))
#inputs_f = Flatten()(inputs)
embedded_layer =Embedding(200, embedding_dim, input_length=180)(inputs)
l_conv1 = Convolution1D(nb_filters, filter_length=filter_sizes[0], border_mode='valid', activation='relu')(embedded_layer)
l_pool1 = MaxPooling1D(pool_length=pooling_size)(l_conv1)
l_conv2 = Convolution1D(nb_filters, filter_length=filter_sizes[0],
border_mode='valid', activation='relu')(l_pool1)
l_pool2 = MaxPooling1D(pool_length=pooling_size)(l_conv2)
l_conv3 = Convolution1D(nb_filters, filter_length=filter_sizes[0],
border_mode='valid', activation='relu')(l_pool2)
l_conv4 = Convolution1D(nb_filters, filter_length=filter_sizes[0],
border_mode='valid', activation='relu')(l_conv3)
l_pool4 = MaxPooling1D(pool_length=2)(l_conv4)

lstm_1 = LSTM(256, return_sequences=False)(l_pool4)
l_in_rep = RepeatVector(180,)(lstm_1)

output size none, 280, 256

l_decoder_1 = LSTM(256, return_sequences=True)(l_in_rep)
l_decoder_2 = LSTM(256, return_sequences=True)(l_decoder_1)

output size none, 280, 256

fc_layer_1 = Dense(68,activation='relu')(l_decoder_2)
drop1_out = Dropout(0.5)(fc_layer_1)
flat = Flatten()(drop1_out)
out = Dense(180,activation='softmax')(flat)
model = Model(inputs, out)
#Compilation
model.compile(loss='categorical_crossentropy', optimizer = 'adam' , metrics=['accuracy'])
model.summary()
##############
On training the model, i got a very high loss values with small accuracy. please tell me if i am doing something wrong and any suggest to fix it will be appreciated. For example:
i = 0
Epoch 1/5
{'acc': array(0.0, dtype=float32), 'loss': array(320.0159606933594, dtype=float32), 'batch': 0, 'size': 32}
1/20000 [..............................] - ETA: 46478s - loss: 320.0160 - acc: @0.0000e+00

@AlexPapas
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests