Fix dtype of `matutils.unitvec`. Fix #1722 #1761

accraze · 2017-12-05T04:49:50Z

matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float

we should make the return type consistent with the input type.

fixes #1722

menshikh-iv · 2017-12-05T06:35:30Z

Hi @accraze, please add tests for this fix (you can use example from #1722 as base for your test), also, check case with sparse too.

jayantj · 2017-12-05T19:58:39Z

gensim/matutils.py

@@ -426,7 +426,7 @@ def unitvec(vec, norm='l2'):
        if norm == 'l2':
            veclen = blas_nrm2(vec)
        if veclen > 0.0:
-            return blas_scal(1.0 / veclen, vec)
+            return blas_scal(1.0 / veclen, vec).astype(vec.dtype)


Hi @accraze , thanks for the fix! A better solution here might be to modify the hardcoded dtype in line 423 above, it simplifies the logic, and also ensures that the dtype is consistent for vectors with all zeros too (a rather trivial and probably uncommon case, of course)

Hey @jayantj, I looked into this, however blas_scal returns an array of type float (see line 398). Not sure if there is a better way to handle this...

@jayantj What else needs to be done? blas_scal is not being used anywhere else. So, should I define blas_scal before line 429 and remove hardcoded float from its the definition?

menshikh-iv · 2017-12-11T10:32:39Z

Ping @accraze, are you planning to finish this PR?

accraze · 2017-12-12T04:17:30Z

@menshikh-iv yes will get it finished this week

menshikh-iv · 2017-12-13T09:48:47Z

@accraze tests failed because next problem in blas_scal from https://github.com/RaRe-Technologies/gensim/pull/1761/files#diff-346353d71f16d5fe11b7c3efcfef9b4eR429, this call change a type

matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float. we should make the return type consistent with the input type. fixes piskvorky#1722

menshikh-iv · 2017-12-18T14:34:55Z

ping @accraze

menshikh-iv · 2017-12-25T13:52:03Z

Ping @accraze, are you planning to finish this PR?

accraze · 2017-12-29T19:23:37Z

@menshikh-iv I'm still looking for a solution, it seems that the test_normmodel tests are failing with my changes. If we pass an array with dtype int64, then it must be cast to float64 otherwise we will get back an array with all zeros.
Example:

ndarray_matrix = np.array([
    [1, 0, 2],
    [0, 0, 3],
    [4, 5, 6]
])

normalized = matutils.unitvec(ndarray_matrix)

print(normalized)
[[0 0 0]
 [0 0 0]
 [0 0 0]]

@pushpankar I've tried moving blas_scal before line 429 and remove hardcoded float from its the definition, however that did not work either.

menshikh-iv · 2018-01-08T06:21:14Z

Thanks for description @accraze
@jayantj have you any ideas about it? Probably cast before return should be the best solution for this case.

jayantj · 2018-01-15T19:31:34Z

gensim/matutils.py

        if norm == 'l1':
            veclen = np.sum(np.abs(vec))
        if norm == 'l2':
            veclen = blas_nrm2(vec)
        if veclen > 0.0:
-            return blas_scal(1.0 / veclen, vec)
+            return blas_scal(1.0 / veclen, vec).astype(vec.dtype)


It looks like this cast at the end astype(vec.dtype) is responsible for causing the returned matrix to be an int-matrix with all zeros. It is impossible to maintain consistency in types between the input array and the returned array for an int input array (since a normalized vector can almost never be an int array). A solution could be to check if the input array is int-like, and handle it accordingly?

I have issued a new pull request addressing this.

menshikh-iv · 2018-02-05T08:17:26Z

Hey @accraze, how is going? Are you going to finish PR?

similar PR #1866

menshikh-iv · 2018-02-14T08:24:57Z

Looks abandoned, I close this PR.

jayantj reviewed Dec 5, 2017

View reviewed changes

accraze force-pushed the patch-1 branch from 52dc61b to 90b02cd Compare December 12, 2017 18:24

returning correct dtype from unitvec

7682b4c

matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float. we should make the return type consistent with the input type. fixes piskvorky#1722

accraze force-pushed the patch-1 branch from 90b02cd to 7682b4c Compare December 15, 2017 01:11

fix flake8 error

6769787

accraze force-pushed the patch-1 branch from bac1882 to 6769787 Compare December 29, 2017 19:04

jayantj reviewed Jan 15, 2018

View reviewed changes

menshikh-iv changed the title ~~Returning correct dtype from matutils.unitvec~~ Fix dtype of matutils.unitvec. Fix #1722 Feb 1, 2018

menshikh-iv closed this Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dtype of `matutils.unitvec`. Fix #1722 #1761

Fix dtype of `matutils.unitvec`. Fix #1722 #1761

accraze commented Dec 5, 2017

menshikh-iv commented Dec 5, 2017

jayantj Dec 5, 2017

accraze Dec 12, 2017

pushpankar Dec 28, 2017 •

edited

Loading

menshikh-iv commented Dec 11, 2017

accraze commented Dec 12, 2017

menshikh-iv commented Dec 13, 2017

menshikh-iv commented Dec 18, 2017

menshikh-iv commented Dec 25, 2017

accraze commented Dec 29, 2017 •

edited

Loading

menshikh-iv commented Jan 8, 2018

jayantj Jan 15, 2018

o-P-o Jan 16, 2018

menshikh-iv commented Feb 5, 2018

menshikh-iv commented Feb 14, 2018

Fix dtype of matutils.unitvec. Fix #1722 #1761

Fix dtype of matutils.unitvec. Fix #1722 #1761

Conversation

accraze commented Dec 5, 2017

menshikh-iv commented Dec 5, 2017

jayantj Dec 5, 2017

Choose a reason for hiding this comment

accraze Dec 12, 2017

Choose a reason for hiding this comment

pushpankar Dec 28, 2017 • edited Loading

Choose a reason for hiding this comment

menshikh-iv commented Dec 11, 2017

accraze commented Dec 12, 2017

menshikh-iv commented Dec 13, 2017

menshikh-iv commented Dec 18, 2017

menshikh-iv commented Dec 25, 2017

accraze commented Dec 29, 2017 • edited Loading

menshikh-iv commented Jan 8, 2018

jayantj Jan 15, 2018

Choose a reason for hiding this comment

o-P-o Jan 16, 2018

Choose a reason for hiding this comment

menshikh-iv commented Feb 5, 2018

menshikh-iv commented Feb 14, 2018

Fix dtype of `matutils.unitvec`. Fix #1722 #1761

Fix dtype of `matutils.unitvec`. Fix #1722 #1761

pushpankar Dec 28, 2017 •

edited

Loading

accraze commented Dec 29, 2017 •

edited

Loading