Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix return dtype for matutils.unitvec according to input dtype. Fix #1722 #1992

Merged
merged 30 commits into from
Apr 16, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
82b8d17
matutils.unitvec bug
o-P-o Jan 31, 2018
834b042
matutils.unitvec fix tests
o-P-o Jan 31, 2018
cf463b2
unitvec equal input and return types
o-P-o Mar 21, 2018
5656835
Update and rename test_unitvec to test_unitvec.py
o-P-o Mar 21, 2018
406ed66
Update matutils.py
o-P-o Mar 21, 2018
e71afcd
Update matutils.py
o-P-o Mar 22, 2018
fe36408
Update test_unitvec.py
o-P-o Mar 22, 2018
50d011c
Update and rename gensim/test_unitvec.py to gensim/test/test_matutils.py
o-P-o Mar 24, 2018
f1a40ac
Merge branch 'develop' into develop
o-P-o Mar 26, 2018
ead451f
Update matutils.py
o-P-o Mar 26, 2018
ae03291
Update test_matutils.py
o-P-o Mar 26, 2018
cab90a8
Update test_matutils.py
o-P-o Mar 26, 2018
141833d
Update following review
o-P-o Mar 27, 2018
218fe42
Update: attempt to solve Travis errors
o-P-o Mar 27, 2018
5fd1004
Update test_matutils.py
o-P-o Mar 27, 2018
80628c0
Update matutils.py
o-P-o Mar 28, 2018
768226b
Update matutils.py
o-P-o Mar 28, 2018
438f763
Update test_matutils.py
o-P-o Mar 28, 2018
f73076a
Addressing travis errors
o-P-o Mar 28, 2018
cd50529
Remove unnecessary dtype assignment
o-P-o Mar 28, 2018
11b0dde
return_norm statements for array instance case
o-P-o Mar 28, 2018
2e86529
Update test_matutils.py
o-P-o Mar 29, 2018
30d6284
Reduce line repetition
o-P-o Mar 29, 2018
d9cfb0d
Reduce repeated lines
o-P-o Mar 29, 2018
2e8eca1
Update test_matutils.py
o-P-o Mar 29, 2018
6beec75
Remove some redundant code from unitvec
o-P-o Apr 12, 2018
365a722
UnitvecTestCase update
o-P-o Apr 12, 2018
9327f68
Small typo fix
o-P-o Apr 12, 2018
a42cf38
Trailing white-space fix for Travis
o-P-o Apr 12, 2018
1078bdd
Improve code quality and remove no-op
o-P-o Apr 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions gensim/matutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -667,45 +667,51 @@ def ret_log_normalize_vec(vec, axis=1):

def unitvec(vec, norm='l2'):
"""Scale a vector to unit length.

Parameters
----------
vec : {numpy.ndarray, scipy.sparse, list of (int, float)}
Input vector in any format
norm : {'l1', 'l2'}, optional
Normalization that will be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return empty lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Returns
-------
{numpy.ndarray, scipy.sparse, list of (int, float)}
Normalized vector in same format as `vec`.

Notes
-----
Zero-vector will be unchanged.

"""
if norm not in ('l1', 'l2'):
raise ValueError("'%s' is not a supported norm. Currently supported norms are 'l1' and 'l2'." % norm)

if scipy.sparse.issparse(vec):
vec = vec.tocsr()
if norm == 'l1':
veclen = np.sum(np.abs(vec.data))
if norm == 'l2':
veclen = np.sqrt(np.sum(vec.data ** 2))
if veclen > 0.0:
return vec / veclen
if np.issubdtype(vec.dtype, np.int) == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to == True

vec = vec.astype(np.float)
return vec / veclen
else:
vec /= veclen
return vec.astype(vec.dtype)
else:
return vec

if isinstance(vec, np.ndarray):
vec = np.asarray(vec, dtype=float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this line is needed especially for blas_* calls, please return it

vec = np.asarray(vec, dtype=vec.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this have no sense (because later you'll cast it again)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - this pretty much seems like a no-op, effectively. Does vec = np.asarray(vec, dtype=vec.dtype) have any effect on vec at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I forgot about this - I will remove it on the next commit.

if norm == 'l1':
veclen = np.sum(np.abs(vec))
if norm == 'l2':
veclen = blas_nrm2(vec)
if veclen > 0.0:
return blas_scal(1.0 / veclen, vec)
if np.issubdtype(vec.dtype, np.int) == True:
vec = vec.astype(np.float)
return blas_scal(1.0 / veclen, vec).astype(vec.dtype)
else:
return blas_scal(1.0 / veclen, vec).astype(vec.dtype)
else:
return vec

Expand Down
27 changes: 27 additions & 0 deletions gensim/test_unitvec.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created pull request there

from scipy import sparse
import unittest
import matutils

class UnitvecTestCase(unittest.TestCase):

def manual_unitvec(self, vec):
self.vec = vec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use 4 spaces for indentation

if sparse.issparse(self.vec):
vec_sum_of_squares = self.vec.multiply(self.vec)
unit = 1. / np.sqrt(vec_sum_of_squares.sum())
return self.vec.multiply(unit)
elif not sparse.issparse(self.vec):
sum_vec_squared = np.sum(self.vec ** 2)
self.vec /= np.sqrt(sum_vec_squared)
return self.vec

def test_unitvec(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's about different vectors (sparse) + different types (more floats + int too)?

input_vector = np.random.uniform(size=(5,)).astype(np.float32)
unit_vector = matutils.unitvec(input_vector)
self.assertEqual(input_vector.dtype, unit_vector.dtype)
self.assertTrue(np.allclose(unit_vector, self.manual_unitvec(input_vector)))

if __name__ == '__main__':

unittest.main()