Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sparse input in interfaces_getitem when num_best is not None #1294

Closed
tmylk opened this issue Apr 27, 2017 · 3 comments
Closed

Support sparse input in interfaces_getitem when num_best is not None #1294

tmylk opened this issue Apr 27, 2017 · 3 comments
Labels
difficulty easy Easy issue: required small fix

Comments

@tmylk
Copy link
Contributor

tmylk commented Apr 27, 2017

Sparse input is ony supported when num_best=None, otherwise errors.
Raised on the mailing list https://groups.google.com/d/msg/gensim/-PhUHorj9-E/N6apxcqnHQAJ

from scipy.sparse import random
from gensim.similarities import SparseMatrixSimilarity
from gensim.matutils import Sparse2Corpus

X = random(200e3, 15e3, density=2e-4, format="csc")
index = SparseMatrixSimilarity(Sparse2Corpus(X, documents_columns=False), num_features=X.shape[1], num_terms=X.shape[0], maintain_sparsity=True, num_best=100)

for s in index:
    print s

returns

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-04d8dd3277db> in <module>()
----> 1 for s in index:
      2         print s
      3 

/home/lev/miniconda2/lib/python2.7/site-packages/gensim/interfaces.pyc in __iter__(self)
    264                 chunk_end = min(self.index.shape[0], chunk_start + self.chunksize)
    265                 chunk = self.index[chunk_start : chunk_end]
--> 266                 for sim in self[chunk]:
    267                     yield sim
    268         else:

/home/lev/miniconda2/lib/python2.7/site-packages/gensim/interfaces.pyc in __getitem__(self, query)
    226         # most similar for each document in turn
    227         if matutils.ismatrix(result):
--> 228             return [matutils.full2sparse_clipped(v, self.num_best) for v in result]
    229         else:
    230             # otherwise, return top-n of the single input document

/home/lev/miniconda2/lib/python2.7/site-packages/gensim/matutils.pyc in full2sparse_clipped(vec, topn, eps)
    241     if topn <= 0:
    242         return []
--> 243     vec = np.asarray(vec, dtype=float)
    244     nnz = np.nonzero(abs(vec) > eps)[0]
    245     biggest = nnz.take(argsort(abs(vec).take(nnz), topn, reverse=True))

/home/lev/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    529 
    530     """
--> 531     return array(a, dtype, copy=False, order=order)
    532 
    533 

ValueError: setting an array element with a sequence.
@tmylk tmylk added the difficulty easy Easy issue: required small fix label Apr 27, 2017
@souravsingh
Copy link
Contributor

@tmylk I am interested in working on the issue. We basically have to add a check for num_best, correct?

@tmylk
Copy link
Contributor Author

tmylk commented May 2, 2017

@souravsingh Need to add a new code path when num_best is not None and input is sparse

@manneshiva
Copy link
Contributor

@tmylk I have fixed this issue and am submitting a PR for the same. Instead of having a separate code path for sparse input and num_best not None, I have added another function, any2sparse_clipped() in matutils and replaced full2sparse_clipped with this function in interfaces.

menshikh-iv pushed a commit that referenced this issue Jun 22, 2017
…one. Fix #1294 (#1321)

* added any2sparse_clipped() function

* changed full2sparse_clipped to any2sparse_clipped in __getitem__

* added missing whitespace

* return topn from any2sparse_clipped()

* efficient any2sparse_clipped implementation

* added unit test for any2sparse_clipped

* function call corrected

* removed any2sparse_clipped and added scipy2scipy_clipped

* added new code path for maintain_sparsity

* added unit tests for new function and issue

* fixed flake8 errors

* fixed matrix_indptr

* added requested changes

* replaced hasattr with getattr

* call abs() once for entire matrix in scipy2scipy_clipped

* removed matrix.sort_indices and removed indptr while calling argsort
saparina pushed a commit to saparina/gensim that referenced this issue Jul 9, 2017
…one. Fix piskvorky#1294 (piskvorky#1321)

* added any2sparse_clipped() function

* changed full2sparse_clipped to any2sparse_clipped in __getitem__

* added missing whitespace

* return topn from any2sparse_clipped()

* efficient any2sparse_clipped implementation

* added unit test for any2sparse_clipped

* function call corrected

* removed any2sparse_clipped and added scipy2scipy_clipped

* added new code path for maintain_sparsity

* added unit tests for new function and issue

* fixed flake8 errors

* fixed matrix_indptr

* added requested changes

* replaced hasattr with getattr

* call abs() once for entire matrix in scipy2scipy_clipped

* removed matrix.sort_indices and removed indptr while calling argsort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty easy Easy issue: required small fix
Projects
None yet
Development

No branches or pull requests

3 participants