Skip to content

Commit

Permalink
Fix duplication and wrong markup in docs (#1633)
Browse files Browse the repository at this point in the history
* Fixed build of docs:

- duplication of the citates from word2vec and doc2vec,
- wrong markup of lists in the scripts,
- some typos.

* Add missing 'tensor' word
  • Loading branch information
horpto authored and menshikh-iv committed Oct 18, 2017
1 parent 2690289 commit 1a1fc44
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 16 deletions.
6 changes: 3 additions & 3 deletions docs/src/scripts/word2vec2tensor.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
:mod:`scripts.word2vec2tensor` --
==================================
:mod:`scripts.word2vec2tensor` -- Convert the word2vec format to Tensorflow 2D tensor
=====================================================================================

.. automodule:: gensim.scripts.word2vec2tensor
:synopsis:
:synopsis: Convert the word2vec format to Tensorflow 2D tensor
:members:
:inherited-members:
:undoc-members:
Expand Down
4 changes: 2 additions & 2 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

"""
Deep learning via the distributed memory and distributed bag of words models from
[1]_, using either hierarchical softmax or negative sampling [2]_ [3]_. See [tutorial]_
[1]_, using either hierarchical softmax or negative sampling [2]_ [3]_. See [#tutorial]_
**Make sure you have a C compiler before installing gensim, to use optimized (compiled)
doc2vec training** (70x speedup [blog]_).
Expand Down Expand Up @@ -35,7 +35,7 @@
In Proceedings of NIPS, 2013.
.. [blog] Optimizing word2vec in gensim, http://radimrehurek.com/2013/09/word2vec-in-python-part-two-optimizing/
.. [tutorial] Doc2vec in gensim tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
.. [#tutorial] Doc2vec in gensim tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
Expand Down
6 changes: 3 additions & 3 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1075,10 +1075,10 @@ def score(self, sentences, total_sentences=int(1e6), chunksize=100, queue_factor
Note that you should specify total_sentences; we'll run into problems if you ask to
score more than this number of sentences but it is inefficient to set the value too high.
See the article by [taddy]_ and the gensim demo at [deepir]_ for examples of how to use such scores in document classification.
See the article by [#taddy]_ and the gensim demo at [#deepir]_ for examples of how to use such scores in document classification.
.. [taddy] Taddy, Matt. Document Classification by Inversion of Distributed Language Representations, in Proceedings of the 2015 Conference of the Association of Computational Linguistics.
.. [deepir] https://github.com/piskvorky/gensim/blob/develop/docs/notebooks/deepir.ipynb
.. [#taddy] Taddy, Matt. Document Classification by Inversion of Distributed Language Representations, in Proceedings of the 2015 Conference of the Association of Computational Linguistics.
.. [#deepir] https://github.com/piskvorky/gensim/blob/develop/docs/notebooks/deepir.ipynb
"""
if FAST_VERSION < 0:
Expand Down
6 changes: 4 additions & 2 deletions gensim/scripts/glove2word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@
"""
USAGE:
$ python -m gensim.scripts.glove2word2vec --input <GloVe vector file> --output <Word2vec vector file>
Where:
<GloVe vector file>: Input GloVe .txt file
<Word2vec vector file>: Desired name of output Word2vec .txt file
* <GloVe vector file>: Input GloVe .txt file.
* <Word2vec vector file>: Desired name of output Word2vec .txt file.
This script is used to convert GloVe vectors in text format into the word2vec text format.
The only difference between the two formats is an extra header line in word2vec,
Expand Down
14 changes: 8 additions & 6 deletions gensim/scripts/word2vec2tensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,22 @@
USAGE: $ python -m gensim.scripts.word2vec2tensor --input <Word2Vec model file> --output <TSV tensor filename prefix> [--binary] <Word2Vec binary flag>
Where:
<Word2Vec model file>: Input Word2Vec model.
<TSV tensor filename prefix>: 2D tensor TSV output file name prefix.
<Word2Vec binary flag>: Set True if Word2Vec model is binary. Defaults to False.
* <Word2Vec model file>: Input Word2Vec model.
* <TSV tensor filename prefix>: 2D tensor TSV output file name prefix.
* <Word2Vec binary flag>: Set True if Word2Vec model is binary. Defaults to False.
Output:
The script will create two TSV files. A 2d tensor format file, and a Word Embedding metadata file. Both files will
us the --output file name as prefix
use the --output file name as prefix.
This script is used to convert the word2vec format to Tensorflow 2D tensor and metadata formats for Embedding Visualization
To use the generated TSV 2D tensor and metadata file in the Projector Visualizer, please
1) Open http://projector.tensorflow.org/.
2) Choose "Load Data" from the left menu.
3) Select "Choose file" in "Load a TSV file of vectors." and choose you local "_tensor.tsv" file
4) Select "Choose file" in "Load a TSV file of metadata." and choose you local "_metadata.tsv" file
3) Select "Choose file" in "Load a TSV file of vectors." and choose you local "_tensor.tsv" file.
4) Select "Choose file" in "Load a TSV file of metadata." and choose you local "_metadata.tsv" file.
For more information about TensorBoard TSV format please visit:
https://www.tensorflow.org/versions/master/how_tos/embedding_viz/
Expand Down

0 comments on commit 1a1fc44

Please sign in to comment.