Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in WmdSimilarity #3269

Closed
tvrbanec opened this issue Nov 21, 2021 · 6 comments · Fixed by #3282
Closed

Possible bug in WmdSimilarity #3269

tvrbanec opened this issue Nov 21, 2021 · 6 comments · Fixed by #3282
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix

Comments

@tvrbanec
Copy link

tvrbanec commented Nov 21, 2021

Problem description

I wanted to check if WmdSimilarity is any better (it should not be) from multiprocessing wmdistance, but it did not work at all.
So, I made a little example and the result was exactly the same: AttributeError: 'WmdSimilarity' object has no attribute 'w2v_model'

Steps/code/corpus to reproduce

from gensim.models import Word2Vec
from gensim.similarities import WmdSimilarity

a = "Trump’s decision not to join the Trans-Pacific Partnership (TPP) came as little surprise. During his election campaign he railed against international trade deals, blaming them for job losses and focusing anger in the industrial heartland. Obama had argued that this deal would provide an effective counterweight to China in the region."
b = "Trump's decision to withdraw the US from TPP is also a first step in the administration's efforts to amass a governing coalition to push the new President's agenda, one that includes the blue-collar workers who defected from Democrats and flocked to Trump's candidacy in November."
c = "Although the Trans-Pacific Partnership had not been approved by Congress, Mr. Trump’s decision to withdraw not only doomed former President Barack Obama’s signature trade achievement, but also carried broad geopolitical implications in a fast-growing region. The deal, which was to link a dozen nations from Canada and Chile to Australia and Japan in a complex web of trade rules, was sold as a way to permanently tie the United States to East Asia and create an economic bulwark against a rising China."

def w2v(corpus):
    w2v_model = Word2Vec(corpus, alpha=0.025, vector_size=50,
         window=5, min_count=1, workers=2, epochs=50, sg = 0)
    return w2v_model

corpus = "\n".join([a,b,c])
model = w2v(corpus)
print(model, type(model))
print(WmdSimilarity([a,b,c], model))

RESULTS:

Word2Vec(vocab=47, vector_size=50, alpha=0.025) <class 'gensim.models.word2vec.Word2Vec'>
Traceback (most recent call last):
  File "/home/tedo/diagnose/proba.py", line 18, in <module>
    print(WmdSimilarity([a,b,c], model))
  File "/usr/local/lib/python3.9/dist-packages/gensim/similarities/docsim.py", line 1099, in __str__
    return "%s<%i docs, %i features>" % (self.__class__.__name__, len(self), self.w2v_model.wv.syn0.shape[1])
AttributeError: 'WmdSimilarity' object has no attribute 'w2v_model'

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import struct; print("Bits", 8 * struct.calcsize("P"))
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

Linux-5.10.0-9-amd64-x86_64-with-glibc2.31
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110]
Bits 64
NumPy 1.21.2
SciPy 1.7.1
gensim 4.1.1
FAST_VERSION 1

@piskvorky piskvorky added bug Issue described a bug difficulty easy Easy issue: required small fix labels Nov 21, 2021
@piskvorky
Copy link
Owner

Thanks for reporting Tedo!

Seems like an omission during the WmdDistance refactoring in #2698, where self.w2v_model was renamed to self.wv. And __str__ is not covered by CI tests so this wasn't caught automatically.

@mpenkov
Copy link
Collaborator

mpenkov commented Dec 4, 2021

@tvrbanec Thank you for reporting this. Are you interested in making a PR to fix the problem?

@tvrbanec
Copy link
Author

tvrbanec commented Dec 4, 2021

@mpenkov I'd like to, but I do not know how to use Github. It has not come to my agenda yet. :(

@DingQK
Copy link
Contributor

DingQK commented Dec 25, 2021

@mpenkov Hi! I am a new user of gensim and I am eager to contribute to the community. Do you still need people to fix this problem?

My intended solution is to fix the self.str() function by changing self.w2v_model.wv.syn0.shape[1] into self.wv.wv.syn0.shape[1], something like this. I will submit a PR solving this problem.

Thank you so much!

@mpenkov
Copy link
Collaborator

mpenkov commented Dec 25, 2021

Sure, @DingQK , just make sure you have tests that demonstrate the correctness of your solution

@DingQK
Copy link
Contributor

DingQK commented Dec 25, 2021

Sure, @DingQK , just make sure you have tests that demonstrate the correctness of your solution

Hi @mpenkov !

I have submitted my PR and attached my test results within it. Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants