Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issues while loading word2vec and doc2vec models saved using old Gensim versions. Fix #2000, #1977 #2012

Merged
merged 13 commits into from
Apr 12, 2018
Merged
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
16 changes: 16 additions & 0 deletions gensim/test/test_doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,22 @@ def testLoadOldModel(self):
model = doc2vec.Doc2Vec.load(datapath(model_file))
self.model_sanity(model)

# Test loading doc2vec models from all previous versions
old_versions = [
'0.12.0', '0.12.1', '0.12.2', '0.12.3', '0.12.4',
'0.13.0', '0.13.1', '0.13.2', '0.13.3', '0.13.4',
'1.0.0', '1.0.1', '2.0.0', '2.1.0', '2.2.0', '2.3.0',
'3.0.0', '3.1.0', '3.2.0', '3.3.0', '3.4.0'
]

saved_models_dir = datapath('old_d2v_models')
Copy link
Contributor

@menshikh-iv menshikh-iv Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better datapath('old_d2v_models/d2v_{}.mdl') and format later

for old_version in old_versions:
model = doc2vec.Doc2Vec.load(os.path.join(saved_models_dir, 'd2v_{}.mdl'.format(old_version)))
self.assertTrue(len(model.wv.vocab) == 3)
self.assertTrue(model.wv.vectors.shape == (3, 4))
self.assertTrue(model.docvecs.vectors_docs.shape == (2, 4))
self.assertTrue(model.docvecs.count == 2)

Copy link
Contributor

@menshikh-iv menshikh-iv Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here save+load+infer_vector here (to be 100% sure that this persistent correctly)? Make sure that you used /tmp directory, check gensim.test.utils, you'll found needed functions (and same for w2v).

Also, please try to update model (as for w2v)

def test_unicode_in_doctag(self):
"""Test storing document vectors of a model with unicode titles."""
model = doc2vec.Doc2Vec(DocsLeeCorpus(unicode_tags=True), min_count=1)
Expand Down
14 changes: 14 additions & 0 deletions gensim/test/test_word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -789,6 +789,20 @@ def testLoadOldModel(self):
self.assertEqual(model.max_final_vocab, None)
self.assertEqual(model.vocabulary.max_final_vocab, None)

# Test loading word2vec models from all previous versions
old_versions = [
'0.12.0', '0.12.1', '0.12.2', '0.12.3', '0.12.4',
'0.13.0', '0.13.1', '0.13.2', '0.13.3', '0.13.4',
'1.0.0', '1.0.1', '2.0.0', '2.1.0', '2.2.0', '2.3.0',
'3.0.0', '3.1.0', '3.2.0', '3.3.0', '3.4.0'
]

saved_models_dir = datapath('old_w2v_models')
for old_version in old_versions:
model = word2vec.Word2Vec.load(os.path.join(saved_models_dir, 'w2v_{}.mdl'.format(old_version)))
self.assertTrue(len(model.wv.vocab) == 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add most_similar + update an model (similar for d2v)

self.assertTrue(model.wv.vectors.shape == (3, 4))

@log_capture()
def testBuildVocabWarning(self, l):
"""Test if warning is raised on non-ideal input to a word2vec model"""
Expand Down