Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Migrate tutorials & how-tos to 4.0.0 #2968

Merged
merged 6 commits into from
Oct 1, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/src/auto_examples/core/run_core_concepts.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\nimport matplotlib.image as mpimg\nimg = mpimg.imread('run_core_concepts.png')\nimgplot = plt.imshow(img)\nplt.axis('off')\nplt.show()"
"import matplotlib.pyplot as plt\nimport matplotlib.image as mpimg\nimg = mpimg.imread('run_core_concepts.png')\nimgplot = plt.imshow(img)\n_ = plt.axis('off')"
]
}
],
Expand Down
3 changes: 1 addition & 2 deletions docs/src/auto_examples/core/run_core_concepts.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,5 +327,4 @@
import matplotlib.image as mpimg
img = mpimg.imread('run_core_concepts.png')
imgplot = plt.imshow(img)
plt.axis('off')
plt.show()
_ = plt.axis('off')
2 changes: 1 addition & 1 deletion docs/src/auto_examples/core/run_core_concepts.py.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
d835332ead51b01a01d6b2483f8bf180
e562837df1242b45d0ab623f5a5254f0
44 changes: 25 additions & 19 deletions docs/src/auto_examples/core/run_core_concepts.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
.. note::
:class: sphx-glr-download-link-note
.. only:: html

.. note::
:class: sphx-glr-download-link-note

Click :ref:`here <sphx_glr_download_auto_examples_core_run_core_concepts.py>` to download the full example code
.. rst-class:: sphx-glr-example-title
Click :ref:`here <sphx_glr_download_auto_examples_core_run_core_concepts.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_core_run_core_concepts.py:
.. _sphx_glr_auto_examples_core_run_core_concepts.py:


Core Concepts
Expand All @@ -24,6 +26,7 @@ This tutorial introduces Documents, Corpora, Vectors and Models: the basic conce




The core concepts of ``gensim`` are:

1. :ref:`core_concepts_document`: some text.
Expand Down Expand Up @@ -54,6 +57,7 @@ paragraph (i.e., journal article abstract), a news article, or a book.




.. _core_concepts_corpus:

Corpus
Expand Down Expand Up @@ -102,6 +106,7 @@ It consists of 9 documents, where each document is a string consisting of a sing




.. Important::
The above example loads the entire corpus into memory.
In practice, corpora may be very large, so loading them into memory may be impossible.
Expand Down Expand Up @@ -169,6 +174,7 @@ a delimiter).




Before proceeding, we want to associate each word in the corpus with a unique
integer ID. We can do this using the :py:class:`gensim.corpora.Dictionary`
class. This dictionary defines the vocabulary of all words that our
Expand Down Expand Up @@ -197,6 +203,7 @@ processing knows about.




Because our corpus is small, there are only 12 different tokens in this
:py:class:`gensim.corpora.Dictionary`. For larger corpuses, dictionaries that
contains hundreds of thousands of tokens are quite common.
Expand Down Expand Up @@ -287,6 +294,7 @@ into these 12-dimensional vectors. We can see what these IDs correspond to:




For example, suppose we wanted to vectorize the phrase "Human computer
interaction" (note that this phrase was not in our original corpus). We can
create the bag-of-word representation for a document using the ``doc2bow``
Expand Down Expand Up @@ -316,6 +324,7 @@ counts:




The first entry in each tuple corresponds to the ID of the token in the
dictionary, the second corresponds to the count of this token.

Expand Down Expand Up @@ -357,6 +366,7 @@ We can convert our entire original corpus to a list of vectors:




Note that while this list lives entirely in memory, in most applications you
will want a more scalable solution. Luckily, ``gensim`` allows you to use any
iterator that returns a single document vector at a time. See the
Expand Down Expand Up @@ -427,6 +437,7 @@ our corpus and transforming the string "system minors":




The ``tfidf`` model again returns a list of tuples, where the first entry is
the token ID and the second entry is the tf-idf weighting. Note that the ID
corresponding to "system" (which occurred 4 times in the original corpus) has
Expand Down Expand Up @@ -457,6 +468,7 @@ preparation for similarity queries:




and to query the similarity of our query document ``query_document`` against every document in the corpus:


Expand All @@ -481,6 +493,7 @@ and to query the similarity of our query document ``query_document`` against eve




How to read this output?
Document 3 has a similarity score of 0.718=72%, document 2 has a similarity score of 42% etc.
We can make this slightly more readable by sorting:
Expand Down Expand Up @@ -514,6 +527,7 @@ We can make this slightly more readable by sorting:




Summary
-------

Expand Down Expand Up @@ -543,32 +557,24 @@ There's still much more to learn about :ref:`sphx_glr_auto_examples_core_run_cor
import matplotlib.image as mpimg
img = mpimg.imread('run_core_concepts.png')
imgplot = plt.imshow(img)
plt.axis('off')
plt.show()
_ = plt.axis('off')



.. image:: /auto_examples/core/images/sphx_glr_run_core_concepts_001.png
:alt: run core concepts
:class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

Out:

.. code-block:: none

/Volumes/work/workspace/gensim_misha/docs/src/gallery/core/run_core_concepts.py:331: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
plt.show()




.. rst-class:: sphx-glr-timing

**Total running time of the script:** ( 0 minutes 1.265 seconds)
**Total running time of the script:** ( 0 minutes 1.675 seconds)

**Estimated memory usage:** 36 MB
**Estimated memory usage:** 37 MB


.. _sphx_glr_download_auto_examples_core_run_core_concepts.py:
Expand All @@ -581,13 +587,13 @@ There's still much more to learn about :ref:`sphx_glr_auto_examples_core_run_cor



.. container:: sphx-glr-download
.. container:: sphx-glr-download sphx-glr-download-python

:download:`Download Python source code: run_core_concepts.py <run_core_concepts.py>`



.. container:: sphx-glr-download
.. container:: sphx-glr-download sphx-glr-download-jupyter

:download:`Download Jupyter notebook: run_core_concepts.ipynb <run_core_concepts.ipynb>`

Expand Down
13 changes: 3 additions & 10 deletions docs/src/auto_examples/core/run_corpora_and_vector_spaces.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\nCorpora and Vector Spaces\n=========================\n\nDemonstrates transforming text into a vector space representation.\n\nAlso introduces corpus streaming and persistence to disk in various formats.\n"
"\nCorpora and Vector Spaces\n=========================\n\nDemonstrates transforming text into a vector space representation.\n\nAlso introduces corpus streaming and persistence to disk in various formats.\n\n"
]
},
{
Expand Down Expand Up @@ -396,13 +396,6 @@
"What Next\n---------\n\nRead about `sphx_glr_auto_examples_core_run_topics_and_transformations.py`.\n\nReferences\n----------\n\nFor a complete reference (Want to prune the dictionary to a smaller size?\nOptimize converting between corpora and NumPy/SciPy arrays?), see the `apiref`.\n\n.. [1] This is the same corpus as used in\n `Deerwester et al. (1990): Indexing by Latent Semantic Analysis <http://www.cs.bham.ac.uk/~pxt/IDA/lsa_ind.pdf>`_, Table 2.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we show a pretty fastText logo so that our gallery picks it up as a thumbnail.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -411,7 +404,7 @@
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\nimport matplotlib.image as mpimg\nimg = mpimg.imread('run_corpora_and_vector_spaces.png')\nimgplot = plt.imshow(img)\nplt.axis('off')\nplt.show()"
"import matplotlib.pyplot as plt\nimport matplotlib.image as mpimg\nimg = mpimg.imread('run_corpora_and_vector_spaces.png')\nimgplot = plt.imshow(img)\n_ = plt.axis('off')"
]
}
],
Expand All @@ -431,7 +424,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
"version": "3.6.5"
}
},
"nbformat": 4,
Expand Down
6 changes: 1 addition & 5 deletions docs/src/auto_examples/core/run_corpora_and_vector_spaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,12 +303,8 @@ def __iter__(self):
# .. [1] This is the same corpus as used in
# `Deerwester et al. (1990): Indexing by Latent Semantic Analysis <http://www.cs.bham.ac.uk/~pxt/IDA/lsa_ind.pdf>`_, Table 2.

###############################################################################
# Here we show a pretty fastText logo so that our gallery picks it up as a thumbnail.
#
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img = mpimg.imread('run_corpora_and_vector_spaces.png')
imgplot = plt.imshow(img)
plt.axis('off')
plt.show()
_ = plt.axis('off')
Original file line number Diff line number Diff line change
@@ -1 +1 @@
62ddd3c9d328a2b81ecdbb4f0fb203b2
e017de81683bfd2f6005a3186bfc1eb3
Loading