Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError loading word2vec model from previous version #1173

Closed
funnydevnull opened this issue Feb 27, 2017 · 6 comments
Closed

IndexError loading word2vec model from previous version #1173

funnydevnull opened this issue Feb 27, 2017 · 6 comments

Comments

@funnydevnull
Copy link

I'm seeing an issue in 1.0.0 that did not occur in 0.13.4.1. I'm simply loading the pre-trained gensim models from this site:

https://zenodo.org/record/162792

And then when I try to lookup a word it seems to be in the vocab but something weird is happening when accessing syn0. What's very strange is that model.vocab[word] returns an index which is smaller than the first access of syn0 but I still get an index error (I have to admit I did not try to track it down in the code). Here's some sample code and the exception:

# check the word 'bon' is in the model
'bon' in model:  True
# extract the index using model.vocab['bon']
index 618
# check the size of syn0
model size:  (475475, 500)
# check that we can access syn0 via model.syn0[0].size
a vector:  (500,)
# check that we can access that vector via model.syn0[618].size
a further vector:  (500,)
# yet still we get an error when we try model['bon']
Get vectors for word 'bon': 
...
    print "Get vectors for word 'bon': ",w2v_mdl['bon']
  File "/usr/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1188, in __getitem__
    return self.wv.__getitem__(words)
  File "/usr/local/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 567, in __getitem__
    return self.word_vec(words)
  File "/usr/local/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 271, in word_vec
    return self.syn0[self.vocab[word].index]
IndexError: list index out of range

It should be easy to reproduce this error by just trying any french word (e.g. 'bon') for the model on that site.

@gojomo
Copy link
Collaborator

gojomo commented Feb 28, 2017

To get a better sense of what's going on, could you please report the values of...

model.wv.vocab['bon'].index

and

model.wv.syn0.shape

@mathrb
Copy link

mathrb commented Feb 28, 2017

Got the same issue (since 1.0 update) on another dictionary, testing word existance works and returns true, but getting it raises the IndexError

@tmylk
Copy link
Contributor

tmylk commented Mar 2, 2017

@mathrb Could you please report the values of model.wv.vocab['bon'].index and model.wv.syn0.shape?

@jayantj
Copy link
Contributor

jayantj commented Mar 3, 2017

Just looked into this issue, the problem seems to be related to loading pre-KeyedVectors models. When the numpy arrays are stored separately, the vectors are not loaded into the syn0 attribute of the KeyedVector instance, and are instead loaded into the syn0 attribute of the Word2Vec instance.

Not sure why this is happening, there was even a unit test specifically for testing this. Checking and fixing asap.

@mathrb
Copy link

mathrb commented Mar 3, 2017

Hello
model.wv.vocab['bon'].index = 109
model.wv.syn0.shape gives me this :
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'list' object has no attribute 'shape'

@jayantj
Copy link
Contributor

jayantj commented Mar 3, 2017

Pushed a fix for this in #1179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants