FastText wrapper returns inconsistent dtypes #1637

mcobzarenco · 2017-10-19T14:08:43Z

Description

gensim.models.wrappers.FastText returns inconsistent dtypes.

Steps/Code/Corpus to Reproduce

from gensim.models.wrappers import FastText
embeds = FastText.load_fasttext_format(...)

For an existing word:

embeds['the'].dtype == dtype('float32')

For an "imputed" word (missing from the vocabulary). The word embedding is computed as the sum of embedding for n-grams:

embeds['ttttt'].dtype == dtype('float64')

The problem in models/wrappers/fasttext.py::FastTextKeyedVectors.word_vec. In the case of a missing word, the zero vector is initialised to be a 64-bit float array to which a bunch of 32-bit embeddings are added to.

Versions

Linux-4.4.0-97-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609]
NumPy 1.13.3
SciPy 0.19.1
gensim 3.0.1
FAST_VERSION 1

The text was updated successfully, but these errors were encountered:

piskvorky · 2017-10-19T14:26:16Z

Nice catch @mcobzarenco ! Thanks.

mcobzarenco mentioned this issue Oct 19, 2017

Ensures FastText returns consistent dtypes #1638

Merged

menshikh-iv added bug Issue described a bug difficulty easy Easy issue: required small fix labels Oct 19, 2017

menshikh-iv closed this as completed in #1638 Oct 24, 2017

menshikh-iv pushed a commit that referenced this issue Oct 24, 2017

Fix FastText inconsistent dtype. Fix #1637 (#1638)

7f23a2c

horpto pushed a commit to horpto/gensim that referenced this issue Oct 28, 2017

Fix FastText inconsistent dtype. Fix piskvorky#1637 (piskvorky#1638)

91549e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastText wrapper returns inconsistent dtypes #1637

FastText wrapper returns inconsistent dtypes #1637

mcobzarenco commented Oct 19, 2017 •

edited

Loading

piskvorky commented Oct 19, 2017

FastText wrapper returns inconsistent dtypes #1637

FastText wrapper returns inconsistent dtypes #1637

Comments

mcobzarenco commented Oct 19, 2017 • edited Loading

Description

Steps/Code/Corpus to Reproduce

Versions

piskvorky commented Oct 19, 2017

mcobzarenco commented Oct 19, 2017 •

edited

Loading