Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix format & links for gensim.similarities.docsim #2030

Merged
merged 2 commits into from
Apr 13, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions gensim/similarities/docsim.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

"""Computing similarities across a collection of documents in the Vector Space Model.

The main class is :class:`~gensim.similarity.docsim.Similarity`, which builds an index for a given set of documents.
The main class is :class:`~gensim.similarities.docsim.Similarity`, which builds an index for a given set of documents.
Once the index is built, you can perform efficient queries like "Tell me how similar is this query document to each
document in the index?". The result is a vector of numbers as large as the size of the initial set of documents,
that is, one float for each index document. Alternatively, you can also request only the top-N most
Expand All @@ -15,13 +15,14 @@

How It Works
------------
The :class:`~gensim.similarity.docsim.Similarity` class splits the index into several smaller sub-indexes ("shards"),
The :class:`~gensim.similarities.docsim.Similarity` class splits the index into several smaller sub-indexes ("shards"),
which are disk-based. If your entire index fits in memory (~hundreds of thousands documents for 1GB of RAM),
you can also use the :class:`~gensim.similarity.docsim.MatrixSimilarity`
or :class:`~gensim.similarity.docsim.SparseMatrixSimilarity` classes directly.
you can also use the :class:`~gensim.similarities.docsim.MatrixSimilarity`
or :class:`~gensim.similarities.docsim.SparseMatrixSimilarity` classes directly.
These are more simple but do not scale as well (they keep the entire index in RAM, no sharding).

Once the index has been initialized, you can query for document similarity simply by:

>>> from gensim.test.utils import common_corpus, common_dictionary, get_tmpfile
>>>
>>> index_tmpfile = get_tmpfile("index")
Expand Down Expand Up @@ -171,7 +172,6 @@ def get_document_id(self, pos):
The vector is of the same type as the underlying index (ie., dense for
:class:`~gensim.similarities.docsim.MatrixSimilarity`
and scipy.sparse for :class:`~gensim.similarities.docsim.SparseMatrixSimilarity`.
TODO: Can dense be scipy.sparse?

"""
assert 0 <= pos < len(self), "requested position out of range"
Expand Down