Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dtype of matutils.unitvec. Fix #1722 #1761

Closed
wants to merge 2 commits into from

Conversation

accraze
Copy link
Contributor

@accraze accraze commented Dec 5, 2017

matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float

we should make the return type consistent with the input type.

fixes #1722

@menshikh-iv
Copy link
Contributor

Hi @accraze, please add tests for this fix (you can use example from #1722 as base for your test), also, check case with sparse too.

@@ -426,7 +426,7 @@ def unitvec(vec, norm='l2'):
if norm == 'l2':
veclen = blas_nrm2(vec)
if veclen > 0.0:
return blas_scal(1.0 / veclen, vec)
return blas_scal(1.0 / veclen, vec).astype(vec.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @accraze , thanks for the fix! A better solution here might be to modify the hardcoded dtype in line 423 above, it simplifies the logic, and also ensures that the dtype is consistent for vectors with all zeros too (a rather trivial and probably uncommon case, of course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jayantj, I looked into this, however blas_scal returns an array of type float (see line 398). Not sure if there is a better way to handle this...

Copy link
Contributor

@pushpankar pushpankar Dec 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayantj What else needs to be done? blas_scal is not being used anywhere else. So, should I define blas_scal before line 429 and remove hardcoded float from its the definition?

@menshikh-iv
Copy link
Contributor

Ping @accraze, are you planning to finish this PR?

@accraze
Copy link
Contributor Author

accraze commented Dec 12, 2017

@menshikh-iv yes will get it finished this week

@menshikh-iv
Copy link
Contributor

@accraze tests failed because next problem in blas_scal from https://github.com/RaRe-Technologies/gensim/pull/1761/files#diff-346353d71f16d5fe11b7c3efcfef9b4eR429, this call change a type

matutils.unitvec currently returns a unitvector of a different dtype from the input vector if the input dtype isn't np.float.

we should make the return type consistent with the input type.

fixes piskvorky#1722
@menshikh-iv
Copy link
Contributor

ping @accraze

@menshikh-iv
Copy link
Contributor

Ping @accraze, are you planning to finish this PR?

@accraze
Copy link
Contributor Author

accraze commented Dec 29, 2017

@menshikh-iv I'm still looking for a solution, it seems that the test_normmodel tests are failing with my changes. If we pass an array with dtype int64, then it must be cast to float64 otherwise we will get back an array with all zeros.
Example:

ndarray_matrix = np.array([
    [1, 0, 2],
    [0, 0, 3],
    [4, 5, 6]
])

normalized = matutils.unitvec(ndarray_matrix)

print(normalized)
[[0 0 0]
 [0 0 0]
 [0 0 0]]

@pushpankar I've tried moving blas_scal before line 429 and remove hardcoded float from its the definition, however that did not work either.

@menshikh-iv
Copy link
Contributor

Thanks for description @accraze
@jayantj have you any ideas about it? Probably cast before return should be the best solution for this case.

if norm == 'l1':
veclen = np.sum(np.abs(vec))
if norm == 'l2':
veclen = blas_nrm2(vec)
if veclen > 0.0:
return blas_scal(1.0 / veclen, vec)
return blas_scal(1.0 / veclen, vec).astype(vec.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this cast at the end astype(vec.dtype) is responsible for causing the returned matrix to be an int-matrix with all zeros. It is impossible to maintain consistency in types between the input array and the returned array for an int input array (since a normalized vector can almost never be an int array). A solution could be to check if the input array is int-like, and handle it accordingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have issued a new pull request addressing this.

@menshikh-iv menshikh-iv changed the title Returning correct dtype from matutils.unitvec Fix dtype of matutils.unitvec. Fix #1722 Feb 1, 2018
@menshikh-iv
Copy link
Contributor

Hey @accraze, how is going? Are you going to finish PR?

similar PR #1866

@menshikh-iv
Copy link
Contributor

Looks abandoned, I close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

matutils.unitvec returns vector with different dtype from input vector
5 participants