Skip to content

1.0.0 Author-Topic modelling

Compare
Choose a tag to compare
@tmylk tmylk released this 24 Feb 22:50
· 1691 commits to develop since this release

1.0.0, 2017-02-24

Deprecated methods:

In order to share word vector querying code between different training algos(Word2Vec, Fastext, WordRank, VarEmbed) we have separated storage and querying of word vectors into a separate class KeyedVectors.

Two methods and several attributes in word2vec class have been deprecated. The methods are load_word2vec_format and save_word2vec_format. The attributes are syn0norm, syn0, vocab, index2word . They have been moved to KeyedVectors class.

After upgrading to this release you might get exceptions about deprecated methods or missing attributes.

DeprecationWarning: Deprecated. Use model.wv.save_word2vec_format instead.
AttributeError: 'Word2Vec' object has no attribute 'vocab'

To remove the exceptions, you should use
KeyedVectors.load_word2vec_format instead of  Word2Vec.load_word2vec_format
word2vec_model.wv.save_word2vec_format instead of  word2vec_model.save_word2vec_format
model.wv.syn0norm instead of  model.syn0norm
model.wv.syn0 instead of  model.syn0
model.wv.vocab instead of model.vocab
model.wv.index2word instead of  model.index2word

Changelog of this release:

New features:

Deprecated features:

  • Move load_word2vec_format and save_word2vec_format out of Word2Vec class to KeyedVectors (@tmylk,#1107)
  • Move properties syn0norm, syn0, vocab, index2word from Word2Vec class to KeyedVectors (@tmylk,#1147)
  • Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,#1145)

Improvements:

  • Python 3.6 support (@tmylk #1077)
  • Phrases and Phraser allow a generator corpus (ELind77 #1099)
  • Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,#1053)
  • Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,#1103
  • Fix broken link to paper in readme (@bhargavvader,#1101)
  • Lazy formatting in evaluate_word_pairs (@akutuzov,#1084)
  • Deacc option to keywords pre-processing (@bhargavvader,#1076)
  • Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, #1165)
  • Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, #1152)
  • Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, #1154)
  • Fix bound computation in Author Topic models. (@olavurmortensen, #1156)
  • Write UTF-8 byte strings in tensorboard conversion (@tmylk,#1144)
  • Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk,#1146)

Tutorial and doc improvements:

  • Clarifying comment in is_corpus func in utils.py (@greninja,#1109)
  • Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,#1120)
  • Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,#1119)
  • Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,#1118)
  • Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,#1116)
  • Update Transformation and Topics link from quick start notebook (@mariana393,#1115)
  • Quick Start Text clarification and typo correction (@luizcavalcanti,#1114)
  • Fix typos in Author-topic tutorial (@Fil,#1102)
  • Address benchmark inconsistencies in Annoy tutorial (@droudy,#1113)
  • Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,#1137)
  • Add documentation for WikiCorpus metadata. (@kirit93, #1163)