Merge branch 'release-1.0.0'

piskvorky · Feb 24, 2017 · 78da89a · 78da89a
2 parents df13670 + b1d38cf
commit 78da89a
Show file tree

Hide file tree

Showing 29 changed files with 2,221 additions and 7,789 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,55 @@ Changes
 
 Unreleased:
 
+
+========
+
+1.0.0, 2017-02-24
+
+New features:
+* Add Author-topic modeling (@olavurmortensen,[#893](https://github.com/RaRe-Technologies/gensim/pull/893))
+* Add FastText word embedding wrapper (@Jayantj,[#847](https://github.com/RaRe-Technologies/gensim/pull/847))
+* Add WordRank word embedding  wrapper (@parulsethi,[#1066](https://github.com/RaRe-Technologies/gensim/pull/1066), [#1125](https://github.com/RaRe-Technologies/gensim/pull/1125))
+* Add sklearn wrapper for LDAModel (@AadityaJ,[#932](https://github.com/RaRe-Technologies/gensim/pull/932))
+
+Deprecated features:
+
+* Move `load_word2vec_format` and `save_word2vec_format` out of Word2Vec class to KeyedVectors  (@tmylk,[#1107](https://github.com/RaRe-Technologies/gensim/pull/1107))
+* Move properties `syn0norm`, `syn0`, `vocab`, `index2word` from Word2Vec class to KeyedVectors (@tmylk,[#1147](https://github.com/RaRe-Technologies/gensim/pull/1147))
+* Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,[#1145](https://github.com/RaRe-Technologies/gensim/pull/1145))
+
+
+Improvements:
+
+* Python 3.6 support (@tmylk [#1077](https://github.com/RaRe-Technologies/gensim/pull/1077))  
+* Phrases and Phraser allow a generator corpus (ELind77 [#1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
+* Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
+* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
+* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101)) 
+* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084)) 
+* Deacc option to keywords pre-processing (@bhargavvader,[#1076](https://github.com/RaRe-Technologies/gensim/pull/1076))
+* Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, [#1165](https://github.com/RaRe-Technologies/gensim/pull/1165)) 
+* Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, [#1152](https://github.com/RaRe-Technologies/gensim/pull/1152)) 
+* Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv,   [#1154](https://github.com/RaRe-Technologies/gensim/pull/1154)) 
+* Fix bound computation in Author Topic models. (@olavurmortensen,   [#1156](https://github.com/RaRe-Technologies/gensim/pull/1156))
+* Write UTF-8 byte strings in tensorboard conversion (@tmylk,[#1144](https://github.com/RaRe-Technologies/gensim/pull/1144))
+* Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk,[#1146](https://github.com/RaRe-Technologies/gensim/pull/1146))
+
+Tutorial and doc improvements:
+
+* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109)) 
+* Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,[#1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
+* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119)) 
+* Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,[#1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
+* Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,[#1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
+* Update Transformation and Topics link from quick start notebook (@mariana393,[#1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
+* Quick Start Text clarification and typo correction (@luizcavalcanti,[#1114](https://github.com/RaRe-Technologies/gensim/pull/1114))
+* Fix typos in Author-topic tutorial (@Fil,[#1102](https://github.com/RaRe-Technologies/gensim/pull/1102))
+* Address benchmark inconsistencies in Annoy tutorial (@droudy,[#1113](https://github.com/RaRe-Technologies/gensim/pull/1113))
+* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137)) 
+* Add documentation for WikiCorpus metadata. (@kirit93, [#1163](https://github.com/RaRe-Technologies/gensim/pull/1163)) 
+
+
 1.0.0RC2, 2017-02-16
 
 * Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137)) 

diff --git a/README.md b/README.md
@@ -78,11 +78,9 @@ For alternative modes of installation (without root privileges,
 development installation, optional install features), see the
 [documentation].
 
-This version has been tested under Python 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6
-(support for Python 2.5 was dropped in gensim 0.10.0; install gensim
-0.9.1 if you *must* use Python 2.5). Gensim’s github repo is hooked
+This version has been tested under Python 2.7, 3.5 and 3.6. Gensim’s github repo is hooked
 against [Travis CI for automated testing] on every commit push and pull
-request.
+request. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5). 
 
 How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
 --------------------------------------------------------------------------------------------------------
@@ -111,6 +109,7 @@ Documentation
   [Tutorials]: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#tutorials
   [Tutorial Videos]: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#videos
   [Official Documentation and Walkthrough]: http://radimrehurek.com/gensim/
+  [Official API Documentation]: http://radimrehurek.com/gensim/apiref.html
 
 ---------
 

diff --git a/docs/notebooks/WMD_tutorial.ipynb b/docs/notebooks/WMD_tutorial.ipynb
diff --git a/docs/notebooks/Word2Vec_FastText_Comparison.ipynb b/docs/notebooks/Word2Vec_FastText_Comparison.ipynb
diff --git a/docs/notebooks/Wordrank_comparisons.ipynb b/docs/notebooks/Wordrank_comparisons.ipynb
diff --git a/docs/notebooks/deepir.ipynb b/docs/notebooks/deepir.ipynb
diff --git a/docs/notebooks/doc2vec-IMDB.ipynb b/docs/notebooks/doc2vec-IMDB.ipynb
@@ -39,9 +39,7 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "import locale\n",
@@ -120,9 +118,7 @@
   {
    "cell_type": "code",
    "execution_count": 3,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "import os.path\n",
@@ -139,9 +135,7 @@
   {
    "cell_type": "code",
    "execution_count": 1,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -202,9 +196,7 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -254,9 +246,7 @@
   {
    "cell_type": "code",
    "execution_count": 5,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from gensim.test.test_doc2vec import ConcatenatedDoc2Vec\n",
@@ -281,9 +271,7 @@
   {
    "cell_type": "code",
    "execution_count": 8,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "import numpy as np\n",
@@ -330,7 +318,7 @@
     "    corrects = sum(np.rint(test_predictions) == [doc.sentiment for doc in test_data])\n",
     "    errors = len(test_predictions) - corrects\n",
     "    error_rate = float(errors) / len(test_predictions)\n",
-    "    return (error_rate, errors, len(test_predictions), predictor)\n"
+    "    return (error_rate, errors, len(test_predictions), predictor)"
    ]
   },
   {
@@ -356,9 +344,7 @@
   {
    "cell_type": "code",
    "execution_count": 9,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from collections import defaultdict\n",
@@ -368,9 +354,7 @@
   {
    "cell_type": "code",
    "execution_count": 10,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -579,9 +563,7 @@
   {
    "cell_type": "code",
    "execution_count": 12,
-   "metadata": {
-    "collapsed": true
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -630,9 +612,7 @@
   {
    "cell_type": "code",
    "execution_count": 13,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -673,9 +653,7 @@
   {
    "cell_type": "code",
    "execution_count": 14,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -703,7 +681,7 @@
     "print(u'TARGET (%d): «%s»\\n' % (doc_id, ' '.join(alldocs[doc_id].words)))\n",
     "print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:\\n' % model)\n",
     "for label, index in [('MOST', 0), ('MEDIAN', len(sims)//2), ('LEAST', len(sims) - 1)]:\n",
-    "    print(u'%s %s: «%s»\\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))\n"
+    "    print(u'%s %s: «%s»\\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))"
    ]
   },
   {
@@ -723,9 +701,7 @@
   {
    "cell_type": "code",
    "execution_count": 15,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "word_models = simple_models[:]"
@@ -734,9 +710,7 @@
   {
    "cell_type": "code",
    "execution_count": 17,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -806,14 +780,10 @@
        "('mockumentary', 0.5149033069610596),<br>\n",
        "('camp-fest', 0.5122634768486023),<br>\n",
        "('mystery/comedy', 0.5020694732666016)]</td></tr></table>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML at 0x1535b84d0>"
       ]
      },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
+     "output_type": "execute_result",
+     "metadata": {}
     }
    ],
    "source": [
@@ -855,9 +825,7 @@
   {
    "cell_type": "code",
    "execution_count": 26,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -897,12 +865,10 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
-    "This cell left intentionally erroneous. "
+    "This cell left intentionally erroneous."
    ]
   },
   {
@@ -915,13 +881,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
-    "from gensim.models import Word2Vec\n",
-    "w2v_g100b = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)\n",
+    "from gensim.models import KeyedVectors\n",
+    "w2v_g100b = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)\n",
     "w2v_g100b.compact_name = 'w2v_g100b'\n",
     "word_models.append(w2v_g100b)"
    ]
@@ -936,9 +900,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "import logging\n",
@@ -957,9 +919,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "%load_ext autoreload\n",
@@ -976,7 +936,7 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 3
+    "version": 3.0
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
@@ -988,4 +948,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}