Skip to content

Commit

Permalink
Merge branch 'release-1.0.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
tmylk committed Feb 24, 2017
2 parents df13670 + b1d38cf commit 78da89a
Show file tree
Hide file tree
Showing 29 changed files with 2,221 additions and 7,789 deletions.
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,55 @@ Changes

Unreleased:


========

1.0.0, 2017-02-24

New features:
* Add Author-topic modeling (@olavurmortensen,[#893](https://github.com/RaRe-Technologies/gensim/pull/893))
* Add FastText word embedding wrapper (@Jayantj,[#847](https://github.com/RaRe-Technologies/gensim/pull/847))
* Add WordRank word embedding wrapper (@parulsethi,[#1066](https://github.com/RaRe-Technologies/gensim/pull/1066), [#1125](https://github.com/RaRe-Technologies/gensim/pull/1125))
* Add sklearn wrapper for LDAModel (@AadityaJ,[#932](https://github.com/RaRe-Technologies/gensim/pull/932))

Deprecated features:

* Move `load_word2vec_format` and `save_word2vec_format` out of Word2Vec class to KeyedVectors (@tmylk,[#1107](https://github.com/RaRe-Technologies/gensim/pull/1107))
* Move properties `syn0norm`, `syn0`, `vocab`, `index2word` from Word2Vec class to KeyedVectors (@tmylk,[#1147](https://github.com/RaRe-Technologies/gensim/pull/1147))
* Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,[#1145](https://github.com/RaRe-Technologies/gensim/pull/1145))


Improvements:

* Python 3.6 support (@tmylk [#1077](https://github.com/RaRe-Technologies/gensim/pull/1077))
* Phrases and Phraser allow a generator corpus (ELind77 [#1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
* Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
* Deacc option to keywords pre-processing (@bhargavvader,[#1076](https://github.com/RaRe-Technologies/gensim/pull/1076))
* Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, [#1165](https://github.com/RaRe-Technologies/gensim/pull/1165))
* Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, [#1152](https://github.com/RaRe-Technologies/gensim/pull/1152))
* Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, [#1154](https://github.com/RaRe-Technologies/gensim/pull/1154))
* Fix bound computation in Author Topic models. (@olavurmortensen, [#1156](https://github.com/RaRe-Technologies/gensim/pull/1156))
* Write UTF-8 byte strings in tensorboard conversion (@tmylk,[#1144](https://github.com/RaRe-Technologies/gensim/pull/1144))
* Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk,[#1146](https://github.com/RaRe-Technologies/gensim/pull/1146))

Tutorial and doc improvements:

* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
* Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,[#1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
* Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,[#1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
* Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,[#1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
* Update Transformation and Topics link from quick start notebook (@mariana393,[#1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
* Quick Start Text clarification and typo correction (@luizcavalcanti,[#1114](https://github.com/RaRe-Technologies/gensim/pull/1114))
* Fix typos in Author-topic tutorial (@Fil,[#1102](https://github.com/RaRe-Technologies/gensim/pull/1102))
* Address benchmark inconsistencies in Annoy tutorial (@droudy,[#1113](https://github.com/RaRe-Technologies/gensim/pull/1113))
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
* Add documentation for WikiCorpus metadata. (@kirit93, [#1163](https://github.com/RaRe-Technologies/gensim/pull/1163))


1.0.0RC2, 2017-02-16

* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,9 @@ For alternative modes of installation (without root privileges,
development installation, optional install features), see the
[documentation].

This version has been tested under Python 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6
(support for Python 2.5 was dropped in gensim 0.10.0; install gensim
0.9.1 if you *must* use Python 2.5). Gensim’s github repo is hooked
This version has been tested under Python 2.7, 3.5 and 3.6. Gensim’s github repo is hooked
against [Travis CI for automated testing] on every commit push and pull
request.
request. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5).

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
--------------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -111,6 +109,7 @@ Documentation
[Tutorials]: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#tutorials
[Tutorial Videos]: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#videos
[Official Documentation and Walkthrough]: http://radimrehurek.com/gensim/
[Official API Documentation]: http://radimrehurek.com/gensim/apiref.html

---------

Expand Down
77 changes: 22 additions & 55 deletions docs/notebooks/WMD_tutorial.ipynb

Large diffs are not rendered by default.

78 changes: 27 additions & 51 deletions docs/notebooks/Word2Vec_FastText_Comparison.ipynb

Large diffs are not rendered by default.

92 changes: 31 additions & 61 deletions docs/notebooks/Wordrank_comparisons.ipynb

Large diffs are not rendered by default.

90 changes: 28 additions & 62 deletions docs/notebooks/deepir.ipynb

Large diffs are not rendered by default.

94 changes: 27 additions & 67 deletions docs/notebooks/doc2vec-IMDB.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"import locale\n",
Expand Down Expand Up @@ -120,9 +118,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"import os.path\n",
Expand All @@ -139,9 +135,7 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -202,9 +196,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -254,9 +246,7 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"from gensim.test.test_doc2vec import ConcatenatedDoc2Vec\n",
Expand All @@ -281,9 +271,7 @@
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
Expand Down Expand Up @@ -330,7 +318,7 @@
" corrects = sum(np.rint(test_predictions) == [doc.sentiment for doc in test_data])\n",
" errors = len(test_predictions) - corrects\n",
" error_rate = float(errors) / len(test_predictions)\n",
" return (error_rate, errors, len(test_predictions), predictor)\n"
" return (error_rate, errors, len(test_predictions), predictor)"
]
},
{
Expand All @@ -356,9 +344,7 @@
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"from collections import defaultdict\n",
Expand All @@ -368,9 +354,7 @@
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -579,9 +563,7 @@
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -630,9 +612,7 @@
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -673,9 +653,7 @@
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -703,7 +681,7 @@
"print(u'TARGET (%d): «%s»\\n' % (doc_id, ' '.join(alldocs[doc_id].words)))\n",
"print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:\\n' % model)\n",
"for label, index in [('MOST', 0), ('MEDIAN', len(sims)//2), ('LEAST', len(sims) - 1)]:\n",
" print(u'%s %s: «%s»\\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))\n"
" print(u'%s %s: «%s»\\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))"
]
},
{
Expand All @@ -723,9 +701,7 @@
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"word_models = simple_models[:]"
Expand All @@ -734,9 +710,7 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -806,14 +780,10 @@
"('mockumentary', 0.5149033069610596),<br>\n",
"('camp-fest', 0.5122634768486023),<br>\n",
"('mystery/comedy', 0.5020694732666016)]</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML at 0x1535b84d0>"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
"output_type": "execute_result",
"metadata": {}
}
],
"source": [
Expand Down Expand Up @@ -855,9 +825,7 @@
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -897,12 +865,10 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"This cell left intentionally erroneous. "
"This cell left intentionally erroneous."
]
},
{
Expand All @@ -915,13 +881,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"from gensim.models import Word2Vec\n",
"w2v_g100b = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)\n",
"from gensim.models import KeyedVectors\n",
"w2v_g100b = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)\n",
"w2v_g100b.compact_name = 'w2v_g100b'\n",
"word_models.append(w2v_g100b)"
]
Expand All @@ -936,9 +900,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
Expand All @@ -957,9 +919,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
Expand All @@ -976,7 +936,7 @@
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
"version": 3.0
},
"file_extension": ".py",
"mimetype": "text/x-python",
Expand All @@ -988,4 +948,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
Loading

0 comments on commit 78da89a

Please sign in to comment.