2.0.0, 2017-04-10
Breaking changes:
Any direct calls to method train() of Word2Vec/Doc2Vec now require an explicit epochs parameter and explicit estimate of corpus size. The most usual way to call train
is vec_model.train(sentences, total_examples=self.corpus_count, epochs=self.iter)
See the method documentation for more information.
- Explicit epochs and corpus size in word2vec train(). (@gojomo, @robotcator, #1139, #1237)
New features:
- Add output word prediction in word2vec. Only for negative sampling scheme. See ipynb (@chinmayapancholi13,#1209)
- scikit_learn wrapper for LSI Model in Gensim (@chinmayapancholi13,#1244)
- Add the 'keep_tokens' parameter to 'filter_extremes'. (@toliwa,#1210)
- Load FastText models with specified encoding (@jayantj,#1210)
Improvements:
- Fix loading large FastText models on Mac. (@jaksmid,#1196)
- Sklearn LDA wrapper now works in sklearn pipeline (@kris-singh,#1213)
- glove2word2vec conversion script refactoring (@parulsethi,#1247)
- Word2vec error message when update called before train . Fix #1162 (@hemavakade,#1205)
- Allow training if model is not modified by "_minimize_model". Add deprecation warning. (@chinmayapancholi13,#1207)
- Update the warning text when building vocab on a trained w2v model (@prakhar2b,#1190)
Bug fixes:
-
Fix word2vec reset_from bug in v1.0.1 Fix #1230. (@Kreiswolke,#1234)
-
Distributed LDA: checking the length of docs instead of the boolean value, plus int index conversion (@saparina ,#1191)
-
syn0_lockf initialised with zero in intersect_word2vec_format() (@KiddoZhu,#1267)
-
Fix wordrank max_iter_dump calculation. Fix #1216 (@ajkl,#1217)
-
Make SgNegative test use sg (@shubhvachher ,#1252)
-
pep8/pycodestyle fixes for hanging indents in Summarization module (@SamriddhiJain ,#1202)
-
WordRank and Mallet wrappers single vs double quote issue in windows.(@prakhar2b,#1208)
-
Fix #824 : no corpus in init, but trim_rule in init (@prakhar2b ,#1186)
Tutorial and doc improvements:
-
Color dictionary according to topic notebook update (@bhargavvader, #1164)
-
Fix hdp show_topic/s docstring ( @parulsethi, #1264)
-
Add docstrings for word2vec.py forwarding functions ( @shubhvachher, #1251)
-
updated description for worker_loop function used in score function ( @chinmayapancholi13 , #1206)