Word2Vec/Doc2Vec offer model-minimization method #446

gojomo · 2015-09-07T04:33:43Z

If you're sure you're done training a model, a bunch of its memory-consumptive parts can be discarded:

syn0 (non-normalized; be sure to save aside normalized versions 1st)
syn1, syn1neg, syn0_lockf
doctag_syn0, doctag_syn0_lockf (in Doc2Vec)

There should be a documented method (such as finished_training) to discard these, and testing to ensure there are no lingering unintended dependencies on those sticking around.

Semantics: As a tradeoff, finished_training discards as many model attributes as possible, while still being able to answer infer_vector and __getitem__ queries on the resulting trimmed model. No more continued training on that word2vec/doc2vec model is possible, and any attempt to do so results in a clear, understandable exception.

(Note though that a Doc2Vec model used for future infer_vector() ops needs to keep the syn0 & syn1* values.)

The text was updated successfully, but these errors were encountered:

add finished_training method

gojomo · 2016-11-02T23:38:26Z

Since I initially wrote this, I've seen cases where the non-unit-normalized syn0 is preferable to the unit-normed version. (Sometimes the magnitude of the vectors is relevant, with their magnitude in some sense being an indicator of stron/unambiguous meaning.) Also, some Doc2Vec users only want the model for inference, but others would consider the doctag_syn0 to be what they want to keep around for lookups/similarity-rankings.

So utility functions for this model-slimming need to be very carefully named and documented to set expectations properly - and perhaps factored to separate operations, rather than one big finished_training().

* issue #446 add finished_training method * private _minimize_model, tests We can't just call «the super method in word2vec explicitly» without adding the flag to save syn0_lockf, which as is necessary to save in d2v. * fix_print * flag finished_training fix * fix_bug with docvecs, controllability * rename flag, flag move, init_sims * renaming the RuntimeError message * fix, add more tests * fix, i == j * fix * tests_fix * delete useless code * numpy fix * hs,neg in tests; assert parameters existance * changelog update * rename replace, description fix

tmylk · 2017-02-08T16:12:31Z

Fixed in #987

piskvorky added feature Issue described a new feature difficulty easy Easy issue: required small fix labels Sep 11, 2015

gojomo mentioned this issue Oct 16, 2015

Doc2Vec.infer_vector: AttributeError: 'Doc2Vec' object has no attribute 'syn1' #483

Closed

pum-purum-pum-pum added a commit to pum-purum-pum-pum/gensim that referenced this issue Oct 31, 2016

issue piskvorky#446

2e9d2a5

add finished_training method

gojomo mentioned this issue Nov 2, 2016

Word2Vec/Doc2Vec offer model-minimization method Fix issue #446 #987

Merged

tmylk closed this as completed Feb 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word2Vec/Doc2Vec offer model-minimization method #446

Word2Vec/Doc2Vec offer model-minimization method #446

gojomo commented Sep 7, 2015

gojomo commented Nov 2, 2016

tmylk commented Feb 8, 2017

Word2Vec/Doc2Vec offer model-minimization method #446

Word2Vec/Doc2Vec offer model-minimization method #446

Comments

gojomo commented Sep 7, 2015

gojomo commented Nov 2, 2016

tmylk commented Feb 8, 2017