Skip to content

Commit

Permalink
Re-design "*2vec" implementation (piskvorky#1777)
Browse files Browse the repository at this point in the history
* first design draft

* adds public interfaces

* adds VocabItem and cleans BaseKeyedVectors

* adds explicit parameters

* implements `train` and adds `Callback` functionality

* refactors `train`, adds classes for vocabulary building and trainable weights.

* changes function parameters

* fixes minor errors

* starts refactoring `Word2Vec` based on new design

* removes `build_vocab_from_freq`, corrects `reset_from`

* changes attribute names

* adds saving/loading from word2vec format

* refactors/renames variables based on new design

* fixes **not** storing normalized vectors and recalculable tables

* replaces `syn0` with `vectors`, adds `estimate_memory`

* fixes indents

* starts `FastText` refactoring based on new design

* refactors to call coomon methods from `word2vec_utils`, removes deprecated methods

* refactors `FastText`

* adds common methods in `word2vec_utils`

* refactors keyedvectors for FT & W2V by creating a common base class

* creates a common base class for Word2Vec and FastText

* deletes word2vec_utils.py

* extracts logging to separate methods

* corrects alpha decay, modifies `_get_thread_working_mem` to support doc2vec

* refactors doc2vec initialization and training

* minor fixes to support doc2vec

* corrects parameter setting while calling `train`

* deletes `callbacks`, fixes alpha setting and degradation from `train`

* adds post training methods and keyedvectors for docvecs

* extracts common methods as functions, discard unnecessary function call

* shifts adding null word from trainables to vocab class

* unifies variable naming

* moves corpus_count from vocabulary to model attribute

* refactors test cases and corrects failing cases

* removes old import

* fixes errors

* creates seperate class for callbacks, adds saving and loss capturing callbacks

* refactors poincare keyedvectors base and related changes

* extracts save/load_word2vec_format as functions to avoid code repition for word2vec and poincare

* removes model initialization to None

* shifts cum_tables, make_cum_table & create_binary_tree from trainables to vocabulary

* adds fasttext test cases

* adds doc strings for public APIs for D2V, W2V & FT

* adds docstrings for keyedvectors

* resolves failing test cases

* updates cython generated .c files

* corrects error statement when failing to import FAST VERSION

* betters logging

* deletes fasttext wrapper

* fixes PEP8 long lines error

* fixes non-any2vec failing test cases

* deletes testing pure python any2vec implementations from tox

* fixes test_similarities failing test cases

* fixes PEP8 errors

* fixes python3 failing test cases

* renames syn0 to vectors in keras integration test

* fixes annoy notebook failure

* adds property aliases for backward compatibility

* adds properties and methods for backward compatibility

* removes trainables save

* minor changes to test cases

* shifts epoch saver callback to an example in docstring

* adds deleters for syn1 & syn1neg

* deprecates old KeyedVectors in favour of Word2VecKeyedVectors

* reverts word2vec_pre_kv_py2 saved models to original

* adds deprecated models and dependent python files

* adds unit tests for loading old models

* imports deprecated in model.__init__

* removes .wv.most_similar calls

* adds code to support loading old models

* adds cython auto generated .c files

* fixes PEP8 failures & fetching attributes from pre_kv word2vec models

* fixes num_ngram_vectors

* fixes estimate_memory, shifts BaseKeyedVectors to keyedvectors.py

* fixes review comments -- typos, indents, adding deprecated. No design change.

* fixes PEP8

* shifts *KeyedVectors to keyedvectors.py

* de-duplicates data between keyedvectors, vocabulary, trainables and removes data copying

* fixes failing cases

* removes unused vocabulary paramter from methods

* removes base classes for vocabulary & trainables, cleans code

* removes build_vocab from BaseAny2VecModel

* fixes vector size for doc2vec

* Fix typo in classname

* remove docs for fasttext wrapper

* update docstrings for callback

* Fix documentation build

* light cleanup for docstrings

* renames private util_any2vec functions

* adds deprecated warning for attributes

* adds deprecated warnings.warn for old doc2vec parameters

* shifts any2vec callback under gensim/models

* adds pure python implementations

* fixes PEP8 errors

* changes build_vocab method signature

* fixes vocabulary trimming error

* fixes long line

* removes deprecated/utils

* adds old_saveload to deprecated

* removes unused import

* returns fasttext wrapper

* adds alias iter setter

* fixes fasttext load error

* ignores PEP8 unused import

* Return fasttext wrapper rst

* Add rst for deprecated stuff

* Add all needed deprecations, upd *.rst.

* add description for deprecated package

* add missing import + return env war to tox config

* drop useless import

* adds num_ngrams_vectors property

* reverts to calling old attributes in all tests

* fixes PEP8
  • Loading branch information
manneshiva authored and sj29-innovate committed Feb 21, 2018
1 parent cfdafa7 commit 1c8a22e
Show file tree
Hide file tree
Showing 47 changed files with 13,630 additions and 5,325 deletions.
5 changes: 5 additions & 0 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@ Modules:
models/wrappers/wordrank
models/wrappers/varembed
models/wrappers/fasttext
models/deprecated/doc2vec
models/deprecated/fasttext
models/deprecated/word2vec
models/deprecated/keyedvectors
models/deprecated/fasttext_wrapper
similarities/docsim
similarities/index
sklearn_api/atmodel
Expand Down
9 changes: 9 additions & 0 deletions docs/src/models/deprecated/doc2vec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.deprecated.doc2vec` -- Deep learning with paragraph2vec
====================================================================

.. automodule:: gensim.models.deprecated.doc2vec
:synopsis: Deep learning with doc2vec
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
10 changes: 10 additions & 0 deletions docs/src/models/deprecated/fasttext.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
:mod:`models.deprecated.fasttext` -- FastText model
===================================================

.. automodule:: gensim.models.deprecated.fasttext
:synopsis: FastText model
:members:
:inherited-members:
:special-members: __getitem__
:undoc-members:
:show-inheritance:
10 changes: 10 additions & 0 deletions docs/src/models/deprecated/fasttext_wrapper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
:mod:`models.deprecated.fasttext_wrapper` -- Wrapper for Facebook implementation of FastText model
==================================================================================================

.. automodule:: gensim.models.deprecated.fasttext_wrapper
:synopsis: FastText model
:members:
:inherited-members:
:special-members: __getitem__
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions docs/src/models/deprecated/keyedvectors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.deprecated.keyedvectors` -- Store and query word vectors
=====================================================================

.. automodule:: gensim.models.deprecated.keyedvectors
:synopsis: Store and query word vectors
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions docs/src/models/deprecated/word2vec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.deprecated.word2vec` -- Deep learning with word2vec
================================================================

.. automodule:: gensim.models.deprecated.word2vec
:synopsis: Deep learning with word2vec
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
6 changes: 3 additions & 3 deletions docs/src/models/wrappers/fasttext.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
:mod:`models.wrappers.fasttext` -- FastText Word Embeddings
===========================================================
:mod:`models.wrappers.fasttext` -- Wrapper for FastText implementation from Facebook
====================================================================================

.. automodule:: gensim.models.wrappers.fasttext
:synopsis: FastText Embeddings
:synopsis: FastText
:members:
:inherited-members:
:undoc-members:
Expand Down
1 change: 1 addition & 0 deletions gensim/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from .translation_matrix import TranslationMatrix, BackMappingTranslationMatrix # noqa:F401

from . import wrappers # noqa:F401
from . import deprecated # noqa:F401

from gensim import interfaces, utils

Expand Down
Loading

0 comments on commit 1c8a22e

Please sign in to comment.