Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating vocab of a FastText model results in change in dtype of model.wv.syn0_vocab #1759

Closed
manneshiva opened this issue Dec 4, 2017 · 0 comments

Comments

@manneshiva
Copy link
Contributor

manneshiva commented Dec 4, 2017

Description

Updating vocabulary causes an unintended change in the dtype of model.wv.syn0_vocab from float32 to float64. The primary cause of this issue is the float64 type numpy array returned by numpy.random.uniform which when vstacked with a float32 numpy array casues the change in dtype. This also produces unpredictable segmentation faults in Cython implementation -- #1742.

Steps/Code/Corpus to Reproduce

from gensim.models.word2vec import LineSentence
from gensim.models.fasttext import FastText as FT_gensim
from gensim.test.utils import common_texts as sentences

new_sentences = [
    ['computer', 'artificial', 'intelligence'],
    ['artificial', 'trees'],
    ['human', 'intelligence'],
    ['artificial', 'graph'],
    ['intelligence'],
    ['artificial', 'intelligence', 'system']
]

model = FT_gensim(size=10, min_count=1)
model.build_vocab(sentences)
print model.wv.syn0_vocab.dtype

model.build_vocab(new_sentences, update=True)
print model.wv.syn0_vocab.dtype

Expected Results

float32
float32

Actual Results

float32
float64

Versions

Linux-4.10.0-40-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.3')
('SciPy', '1.0.0')
('gensim', '3.1.0')
('FAST_VERSION', 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant