Updating vocab of a `FastText` model results in change in `dtype` of `model.wv.syn0_vocab` #1759

manneshiva · 2017-12-04T20:26:00Z

Description

Updating vocabulary causes an unintended change in the dtype of model.wv.syn0_vocab from float32 to float64. The primary cause of this issue is the float64 type numpy array returned by numpy.random.uniform which when vstacked with a float32 numpy array casues the change in dtype. This also produces unpredictable segmentation faults in Cython implementation -- #1742.

Steps/Code/Corpus to Reproduce

from gensim.models.word2vec import LineSentence
from gensim.models.fasttext import FastText as FT_gensim
from gensim.test.utils import common_texts as sentences

new_sentences = [
    ['computer', 'artificial', 'intelligence'],
    ['artificial', 'trees'],
    ['human', 'intelligence'],
    ['artificial', 'graph'],
    ['intelligence'],
    ['artificial', 'intelligence', 'system']
]

model = FT_gensim(size=10, min_count=1)
model.build_vocab(sentences)
print model.wv.syn0_vocab.dtype

model.build_vocab(new_sentences, update=True)
print model.wv.syn0_vocab.dtype

Expected Results

float32
float32

Actual Results

float32
float64

Versions

Linux-4.10.0-40-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.3')
('SciPy', '1.0.0')
('gensim', '3.1.0')
('FAST_VERSION', 1)

The text was updated successfully, but these errors were encountered:

manneshiva mentioned this issue Dec 4, 2017

Fixes change in dtype of model.wv.syn0_vocab on updating vocab of a FastText model #1760

Merged

menshikh-iv closed this as completed in ea1f3cf Dec 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating vocab of a `FastText` model results in change in `dtype` of `model.wv.syn0_vocab` #1759

Updating vocab of a `FastText` model results in change in `dtype` of `model.wv.syn0_vocab` #1759

manneshiva commented Dec 4, 2017 •

edited

Loading

Updating vocab of a FastText model results in change in dtype of model.wv.syn0_vocab #1759

Updating vocab of a FastText model results in change in dtype of model.wv.syn0_vocab #1759

Comments

manneshiva commented Dec 4, 2017 • edited Loading

Description

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

Updating vocab of a `FastText` model results in change in `dtype` of `model.wv.syn0_vocab` #1759

Updating vocab of a `FastText` model results in change in `dtype` of `model.wv.syn0_vocab` #1759

manneshiva commented Dec 4, 2017 •

edited

Loading