Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not open and score a Word2vec model generated in v3.8.0 in v4.2.0 #3413

Closed
Wats0ns opened this issue Dec 14, 2022 · 8 comments · Fixed by #3415
Closed

Can not open and score a Word2vec model generated in v3.8.0 in v4.2.0 #3413

Wats0ns opened this issue Dec 14, 2022 · 8 comments · Fixed by #3415
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple) impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users
Milestone

Comments

@Wats0ns
Copy link

Wats0ns commented Dec 14, 2022

Problem description

Hello,

I'm trying to score a sentence in a script using gensim v4.2.0, with a model trained in v3.8.0. However, I meet the error AttributeError: 'Word2Vec' object has no attribute 'syn1'

Steps/code/corpus to reproduce

Link to the model file: https://filetransfer.io/data-package/DIMOegMO#link

from gensim.models import Word2vec
from gensim.utils import SaveLoad

sv = SaveLoad()
model = sv.load('test.mdl')
model.score(['test'], total_sentences=1)

Versions

>>> import platform; print(platform.platform())
Linux-5.4.0-135-generic-x86_64-with-Ubuntu-18.04-bionic
>>> import sys; print("Python", sys.version)
Python 3.6.9 (default, Nov 25 2022, 14:10:45) 
[GCC 8.4.0]
>>> import struct; print("Bits", 8 * struct.calcsize("P"))
Bits 64
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.18.3
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.4.1
>>> import gensim; print("gensim", gensim.__version__)
gensim 4.2.0
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
FAST_VERSION 1

Thanks a lot !

@piskvorky
Copy link
Owner

piskvorky commented Dec 14, 2022

The way to load (and save) Gensim models is using this pattern: https://radimrehurek.com/gensim/models/word2vec.html#usage-examples

It looks like you're doing something else – could that be the source of the error?

@Wats0ns
Copy link
Author

Wats0ns commented Dec 15, 2022

@piskvorky
I'm currently saving the models as said in the documentation. Loading them instead with

model = Word2Vec.load("word2vec.model")

didn't change anything
The model is build with the following parameters:

model = Word2Vec(
            window=6,
            workers=6,
            sg=0,
            hs=1,
            seed=21,
            vector_size=300,
            min_count=4,
            batch_words=1000
        )
model.build_vocab(cleaned_content, progress_per=10000)
logging.root.level = logging.ERROR  # Temporarily reduce logging verbosity
model.train(
            cleaned_content,
            total_examples=model.corpus_count,
            epochs=60,
            report_delay=1
        )

@piskvorky
Copy link
Owner

I see. In that case, it should work. I'll try to check later, thanks for reporting.

@Wats0ns
Copy link
Author

Wats0ns commented Dec 15, 2022

@piskvorky
Thanks a lot, in order to make it easier to debug, here is a fully working example script to reproduce the bug:

docker run -v $PWD:/app -w /app python:3.8-slim bash -c "pip install gensim==3.8.0; python -c \"from gensim.models import Word2Vec; model = Word2Vec(window=6,workers=6,sg=0,hs=1,seed=21,size=300,min_count=1,batch_words=1000); cleaned_content=['test']; model.build_vocab(cleaned_content, progress_per=10000); model.train(cleaned_content,total_examples=model.corpus_count,epochs=60,report_delay=1); model.save('test.mdl')\""

docker run -v $PWD:/app -w /app python:3.8-slim bash -c "pip install gensim==4.2.0; python -c \"from gensim.models import Word2Vec; model = Word2Vec.load('test.mdl'); model.score(['test'])\""

Please let me know if I can help you further

@piskvorky
Copy link
Owner

piskvorky commented Dec 15, 2022

I found this dodgy code in Gensim that's probably the culprit:

https://github.com/RaRe-Technologies/gensim/blob/45d35eee1e7d0c9eb69ae76a99b2ed7cc35a1c0b/gensim/models/word2vec.py#L1989-L1991

This is a part of the code that loads (converts) old models. I'll have to check the commit history to see what this is about (CC @gojomo ), but could you try removing that block and see if that helps?

@piskvorky
Copy link
Owner

piskvorky commented Dec 15, 2022

I traced the bug down to https://github.com/RaRe-Technologies/gensim/pull/2698/files

Looks like a botched migration; @Wats0ns can you verify removing the 3 lines above fixes the issue, and then submit a PR?

We're about to do a new release this week so we could still squeeze that in. Thanks.

@piskvorky piskvorky added bug Issue described a bug difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple) impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users labels Dec 15, 2022
@piskvorky piskvorky added this to the Next release milestone Dec 15, 2022
@piskvorky
Copy link
Owner

@Wats0ns please confirm the fix.

@Wats0ns
Copy link
Author

Wats0ns commented Dec 16, 2022

@piskvorky I confirm that it works now with your fix, thanks a lot for your fast responses and fix, very impressed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple) impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants