Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matutils.unitvec returns vector with different dtype from input vector #1722

Closed
jayantj opened this issue Nov 17, 2017 · 1 comment
Closed
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple)

Comments

@jayantj
Copy link
Contributor

jayantj commented Nov 17, 2017

Description

matutils.unitvec can return a unitvector of a different dtype from the input vector if the input dtype isn't np.float.

Steps/Code/Corpus to Reproduce

from gensim import matutils
import numpy as np

input_vector = np.random.uniform(size=(100,)).astype(np.float32)
print(input_vector.dtype)
>>> float32

unit_vector = matutils.unitvec(input_vector) 
print(unit_vector.dtype)
>>> float64

This seems like mostly an internally used method, so it's probably not super-important but it does lead to some surprises while developing.

@jayantj jayantj changed the title matutils.unitvec return vector with different dtype from input vector matutils.unitvec returns vector with different dtype from input vector Nov 17, 2017
@menshikh-iv menshikh-iv added bug Issue described a bug difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple) labels Nov 20, 2017
@piskvorky
Copy link
Owner

Not a part of the method's contract, but I'd consider it a surprising behaviour too. Let's make the return type consistent with the input type.

accraze added a commit to accraze/gensim that referenced this issue Dec 5, 2017
matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float.

we should make the return type consistent with the input type.

fixes piskvorky#1722
accraze added a commit to accraze/gensim that referenced this issue Dec 12, 2017
matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float.

we should make the return type consistent with the input type.

fixes piskvorky#1722
accraze added a commit to accraze/gensim that referenced this issue Dec 15, 2017
matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float.

we should make the return type consistent with the input type.

fixes piskvorky#1722
darindf pushed a commit to darindf/gensim that referenced this issue Apr 23, 2018
…iskvorky#1722 (piskvorky#1992)

* matutils.unitvec bug

As requested, I have edited the fix to ignore dtype size. I use np.issubtype to check input type and handle appropriately before return to ensure non-integer output.

* matutils.unitvec fix tests

Tests to ensure float output for both float and integer inputs.

* unitvec equal input and return types

* Update and rename test_unitvec to test_unitvec.py

* Update matutils.py

* Update matutils.py

* Update test_unitvec.py

* Update and rename gensim/test_unitvec.py to gensim/test/test_matutils.py

* Update matutils.py

* Update test_matutils.py

* Update test_matutils.py

* Update following review

Removed leading spaces, which is the source of the PEP8/travis errors. Sorry, only just learnt from you what these actually are :)
Also updated the code to include 'if return_norm' statement from the sparse array case. (I can't remember why I actually removed this in the first place.)

* Update: attempt to solve Travis errors

* Update test_matutils.py

* Update matutils.py

* Update matutils.py

* Update test_matutils.py

* Addressing travis errors

* Remove unnecessary dtype assignment

* return_norm statements for array instance case

* Update test_matutils.py

* Reduce line repetition

* Reduce repeated lines

* Update test_matutils.py

* Remove some redundant code from unitvec

This is what I have done based on Jayanti's suggestion of redundant code. Let me know if I have misunderstood.

* UnitvecTestCase update

Simplified tha manual_unitvec method and created a separate test for each `arrtype, dtype` pair, as suggested.

* Small typo fix

* Trailing white-space fix for Travis

* Improve code quality and remove no-op
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants