Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Re-design "*2vec" implementation (piskvorky#1777)
* first design draft * adds public interfaces * adds VocabItem and cleans BaseKeyedVectors * adds explicit parameters * implements `train` and adds `Callback` functionality * refactors `train`, adds classes for vocabulary building and trainable weights. * changes function parameters * fixes minor errors * starts refactoring `Word2Vec` based on new design * removes `build_vocab_from_freq`, corrects `reset_from` * changes attribute names * adds saving/loading from word2vec format * refactors/renames variables based on new design * fixes **not** storing normalized vectors and recalculable tables * replaces `syn0` with `vectors`, adds `estimate_memory` * fixes indents * starts `FastText` refactoring based on new design * refactors to call coomon methods from `word2vec_utils`, removes deprecated methods * refactors `FastText` * adds common methods in `word2vec_utils` * refactors keyedvectors for FT & W2V by creating a common base class * creates a common base class for Word2Vec and FastText * deletes word2vec_utils.py * extracts logging to separate methods * corrects alpha decay, modifies `_get_thread_working_mem` to support doc2vec * refactors doc2vec initialization and training * minor fixes to support doc2vec * corrects parameter setting while calling `train` * deletes `callbacks`, fixes alpha setting and degradation from `train` * adds post training methods and keyedvectors for docvecs * extracts common methods as functions, discard unnecessary function call * shifts adding null word from trainables to vocab class * unifies variable naming * moves corpus_count from vocabulary to model attribute * refactors test cases and corrects failing cases * removes old import * fixes errors * creates seperate class for callbacks, adds saving and loss capturing callbacks * refactors poincare keyedvectors base and related changes * extracts save/load_word2vec_format as functions to avoid code repition for word2vec and poincare * removes model initialization to None * shifts cum_tables, make_cum_table & create_binary_tree from trainables to vocabulary * adds fasttext test cases * adds doc strings for public APIs for D2V, W2V & FT * adds docstrings for keyedvectors * resolves failing test cases * updates cython generated .c files * corrects error statement when failing to import FAST VERSION * betters logging * deletes fasttext wrapper * fixes PEP8 long lines error * fixes non-any2vec failing test cases * deletes testing pure python any2vec implementations from tox * fixes test_similarities failing test cases * fixes PEP8 errors * fixes python3 failing test cases * renames syn0 to vectors in keras integration test * fixes annoy notebook failure * adds property aliases for backward compatibility * adds properties and methods for backward compatibility * removes trainables save * minor changes to test cases * shifts epoch saver callback to an example in docstring * adds deleters for syn1 & syn1neg * deprecates old KeyedVectors in favour of Word2VecKeyedVectors * reverts word2vec_pre_kv_py2 saved models to original * adds deprecated models and dependent python files * adds unit tests for loading old models * imports deprecated in model.__init__ * removes .wv.most_similar calls * adds code to support loading old models * adds cython auto generated .c files * fixes PEP8 failures & fetching attributes from pre_kv word2vec models * fixes num_ngram_vectors * fixes estimate_memory, shifts BaseKeyedVectors to keyedvectors.py * fixes review comments -- typos, indents, adding deprecated. No design change. * fixes PEP8 * shifts *KeyedVectors to keyedvectors.py * de-duplicates data between keyedvectors, vocabulary, trainables and removes data copying * fixes failing cases * removes unused vocabulary paramter from methods * removes base classes for vocabulary & trainables, cleans code * removes build_vocab from BaseAny2VecModel * fixes vector size for doc2vec * Fix typo in classname * remove docs for fasttext wrapper * update docstrings for callback * Fix documentation build * light cleanup for docstrings * renames private util_any2vec functions * adds deprecated warning for attributes * adds deprecated warnings.warn for old doc2vec parameters * shifts any2vec callback under gensim/models * adds pure python implementations * fixes PEP8 errors * changes build_vocab method signature * fixes vocabulary trimming error * fixes long line * removes deprecated/utils * adds old_saveload to deprecated * removes unused import * returns fasttext wrapper * adds alias iter setter * fixes fasttext load error * ignores PEP8 unused import * Return fasttext wrapper rst * Add rst for deprecated stuff * Add all needed deprecations, upd *.rst. * add description for deprecated package * add missing import + return env war to tox config * drop useless import * adds num_ngrams_vectors property * reverts to calling old attributes in all tests * fixes PEP8
- Loading branch information