[WIP] Add ability to use Tensorflow to train a word2vec model #809

droudy · 2016-07-28T18:49:05Z

The flexible architecture of tensorflow allows you to deploy computation to one or more CPUs or GPUs with a single API. The benefit of using tensorflow for training in w2v is that it can distribute computations across GPUs. This PR adds the ability for a user to easily create a w2v model that is trained using tensorflow but still allows gensim w2v methods such as most_similar() and doesnt_match() to be called on the model that is trained with tensorflow.

It works around an existing tensorflow module, tensorflow.models.embedding.word2vec_optimized


Training data can be a gensim style corpus or a text file
model = TfWord2Vec(example_corpus)

model.most_similar("army")

tmylk · 2016-08-02T16:32:22Z

Ping @gojomo

piskvorky · 2016-08-02T18:25:10Z

What is the status on tensorflow in gensim -- is there a notebook explaning the motivation, comparing the performance / pros / cons?

Also, what is @gojomo 's role here?

piskvorky · 2016-08-02T18:26:03Z

gensim/models/tfword2vec.py

+from gensim import utils
+from six import string_types
+import logging
+logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)


This doesn't belong in module scope -- libraries do not set up logging. That's up to applications that use them.

gojomo · 2016-08-02T20:12:17Z

I think the idea of being able to use TF training or import vectors from a TF session is good, but this structuring seems fragile/confusing - especially mixed-overriding & renaming of parameters.

I suspect the better approach would involve some combination of: (1) a new common superclass for what is shared in implementation or interface, and having the TF implementation being a sibling class, rather than patchwork-subclass, of the traditional implementations; (2) moving the vectors-and-vocab entity out of the algorithmic entity, as proposed in #549. Of course such refactoring is a kind of big and disruptive project...

tmylk · 2016-10-04T14:04:45Z

Current status: Blocked by #549

tmylk · 2016-11-13T09:13:54Z

@anmol01gulati This PR can now be updated to use KeyedVecs from #980

markroxor · 2016-12-02T13:48:17Z

Extending this PR here #1033

droudy added 5 commits July 27, 2016 14:21

tf word2vec WIP

2fb9af7

Update tfword2vec.py

2349cee

raise exception for gensim training methods

3f3acf7

Update tfword2vec.py

58c908c

Update tfword2vec.py

c2607c6

piskvorky reviewed Aug 2, 2016
View reviewed changes

droudy mentioned this pull request Aug 3, 2016

Potential refactor: a 'NamedVectors' class for reuse by Word2Vec, Doc2Vec, etc #549

Closed

droudy added 3 commits August 5, 2016 14:21

Testing TF vs Gensim

ad03abd

tf epochs to match gensim

30b681d

decrease batch size

7ce0d95

tmylk added feature Issue described a new feature difficulty medium Medium issue: required good gensim understanding & python skills labels Oct 4, 2016

markroxor mentioned this pull request Dec 2, 2016

[WIP] TensorFlow wrapper for using GPU #1033

Closed

tmylk closed this Dec 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add ability to use Tensorflow to train a word2vec model #809

[WIP] Add ability to use Tensorflow to train a word2vec model #809

droudy commented Jul 28, 2016 •

edited

Loading

tmylk commented Aug 2, 2016

piskvorky commented Aug 2, 2016 •

edited

Loading

piskvorky Aug 2, 2016

gojomo commented Aug 2, 2016

tmylk commented Oct 4, 2016

tmylk commented Nov 13, 2016 •

edited

Loading

markroxor commented Dec 2, 2016

[WIP] Add ability to use Tensorflow to train a word2vec model #809

[WIP] Add ability to use Tensorflow to train a word2vec model #809

Conversation

droudy commented Jul 28, 2016 • edited Loading

Training data can be a gensim style corpus or a text file

tmylk commented Aug 2, 2016

piskvorky commented Aug 2, 2016 • edited Loading

piskvorky Aug 2, 2016

Choose a reason for hiding this comment

gojomo commented Aug 2, 2016

tmylk commented Oct 4, 2016

tmylk commented Nov 13, 2016 • edited Loading

markroxor commented Dec 2, 2016

droudy commented Jul 28, 2016 •

edited

Loading

piskvorky commented Aug 2, 2016 •

edited

Loading

tmylk commented Nov 13, 2016 •

edited

Loading