[WIP] Computes training loss for Word2Vec model (fixes issue #999) #1201

chinmayapancholi13 · 2017-03-09T17:33:49Z

This PR computes the training loss for each pair of words being trained as per the skip gram model. The loss value is computed in the function train_sg_pair and the value is displayed after every print_freq (passed as a parameter to function train_batch_sg) number of training pairs. To not show the loss value at all, set print_freq as 0 (which is the default value).

piskvorky · 2017-03-10T00:48:41Z

I can't see how a pure-Python implementation would be useful... I assume people who need to see progress need to see it on large data. And pure-Python on large data would be too slow to start with.

chinmayapancholi13 · 2017-03-10T01:36:46Z

@piskvorky Yes I agree completely. However, the manner in which we want to compute and use the loss value is itself not clear right now (ongoing discussion at #999). This is why I had submitted a PR first by making changes for the Python path for skip gram model and have planned to make appropriate changes for other paths, both for skip gram and CBOW, for Word2Vec (and also for Doc2Vec).

…into develop

menshikh-iv · 2017-05-08T09:32:38Z

gensim/models/word2vec.py

@@ -140,7 +140,7 @@
    FAST_VERSION = -1
    MAX_WORDS_IN_BATCH = 10000

-    def train_batch_sg(model, sentences, alpha, work=None):
+    def train_batch_sg(model, sentences, alpha, work=None, print_freq=0):


Maybe rename print_freq to eval_every as well as in LdaModel

Sure Ivan. I'll update the parameter name.

menshikh-iv · 2017-05-08T09:48:11Z

A good start, I think. Although pure python implementation can be slow (need the benchmark for check this), if you rarely logging model loss, this does not slow learning process significantly (but Cython version is the best choice).

…into develop

…into word2vec_skipgram_loss

…ancholi13/gensim into word2vec_skipgram_loss

gojomo · 2017-05-23T23:06:03Z

I personally don't see this current implementation as useful enough to justify the code-complication. The people who requested it likely don't use the pure-python path – it's an advanced need, and advanced users are dependent on the optimized code.

The discussion on #999 seemed to conclude that accumulating the error in the model was a superior approach – but this only logs the value of train_error_value, the error of a single (!) skip-pair training example, occasionally. (I think that number will jump all over, based on which 2 words happen to have been in the example when logging is chosen. I find it hard to imagine any use for such logged values.)

I believe this feature will require the active participation of someone who actively needs the feature – either to provide a good spec & review, or to implement – to become a worthwhile addition.

chinmayapancholi13 · 2017-05-23T23:24:29Z

@gojomo I am indeed planning to implement this for both - python path as well as the optimized cython path.
Also, as per the latest discussion on #999, it was agreed that storing the sum of the training losses of all the word-pairs would be a good start to address the issue. As mentioned in the comment by @dietmar here, this would enable a user to get the cumulative training loss value for each call to the function train().

So overall this should infact assist the users in getting an idea about how the training of their model is progressing. But having said that, I do agree that this may actually be just one of the many use-cases that users might need this functionality for so more user input before going forward with the implementation wouldn't hurt.

cc : @tmylk @menshikh-iv

gojomo · 2017-05-24T01:24:15Z

At the time of my comment, before commit 8949749, there wasn't yet any accumulation – only logging of occasional single-pair errors. Yes, a value for a whole epoch, that is thus comparable against another epoch, would be useful. (A running average of 'recent' error might also be useful.)

piskvorky · 2017-06-13T07:48:02Z

How does this extra branching / extra code in the critical path affect performance?

@chinmayapancholi13 Can you post some benchmarks, before + after?

…into word2vec_skipgram_loss

chinmayapancholi13 · 2017-06-13T08:08:43Z

@piskvorky Sure. I plan to post benchmarks for before v/s after cases, as soon as I complete the coding part.

tmylk · 2017-06-29T11:40:52Z

gensim/test/test_word2vec.py

@@ -698,6 +698,13 @@ def test_reset_from(self):
        model.reset_from(other_model)
        self.assertEqual(model.wv.vocab, other_vocab)

+    def test_compute_training_loss(self):
+        model = word2vec.Word2Vec(min_count=1, sg=1, negative=5, hs=1)


Please add tests for more training modes.

chinmayapancholi13 · 2017-06-29T17:35:52Z

Benchmarks for seeing effect of loss computation code on training time
text8_50000000 : 50 MB size
lee_background.cor : 25 KB size

train_data	compute_loss	hs	sg	mean	std
text8_50000000	True	0	1	125.242767	0.442522
text8_50000000	False	0	1	124.090732	1
text8_50000000	True	1	1	252.800164	1.140344
text8_50000000	False	1	1	245.065643	2.657392
text8_50000000	True	0	0	43.812430	1.216697
text8_50000000	False	0	0	42.815214	0.142814
text8_50000000	True	1	0	74.801153	0.300728
text8_50000000	False	1	0	74.236441	0.126426
lee_background.cor	True	0	1	0.560387	0.005805
lee_background.cor	False	0	1	0.687179	0.143629
lee_background.cor	True	1	1	1.126855	0.004407
lee_background.cor	False	1	1	1.135358	0.059161
lee_background.cor	True	0	0	0.316948	0.005148
lee_background.cor	False	0	0	0.319236	0.005792
lee_background.cor	True	1	0	0.429879	0.005373
lee_background.cor	False	1	0	0.429489	0.000756

Edit : The benchmarks posted here have only been run for test data sizes of 25 KB and 50 MB. The associated ipynb (https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb) has a section for reproducing the benchmark values for test data of sizes 25 KB, 1 MB, 10 MB, 50 MB and 100 MB.

gojomo · 2017-06-29T18:16:38Z

The lee corpus is too small/quick to get any feedback. From the text8 results, it looks like 'true' may cause an 0.5% to 3% slowdown. Running text8 with more iterations might increase the reliability of the result. A comparison against the code without even the switch-to-turn-it-on (without this feature at all) might be relevant.

gojomo · 2017-06-29T18:18:02Z

continuous_integration/travis/flake8_diff.sh

@@ -133,6 +133,6 @@ check_files() {
 if [[ "$MODIFIED_FILES" == "no_match" ]]; then
    echo "No file has been modified"
 else
-    check_files "$(echo "$MODIFIED_FILES" )" "--ignore=E501,E731,E12,W503 --exclude=*.sh,*.md,*.yml,*.rst,*.ipynb,*.txt,*.csv,*.vec,Dockerfile*"
+    check_files "$(echo "$MODIFIED_FILES" )" "--ignore=E501,E731,E12,W503 --exclude=*.sh,*.md,*.yml,*.rst,*.ipynb,*.txt,*.csv,*.vec,Dockerfile*,*.c,*.pyx"


Are we sure .pyx should be here? I didn't see what kind of warnings flake was generating, but as cython syntax is mostly python, and most of our enforceable conventions should still be in effect, we may want some style-enforcement there.

@gojomo flake8 can't correctly check pyx files

@gojomo We were getting errors like these due to flake8 :

So although I do agree that there is some style-checking that we might want to do in .pyx files (in the python-like code), to avoid getting errors due to cases similar to the above cases, I thought it would be better to ignore .pyx for flake8 tests.

Ah, I see. There's an SO answer that implies it may be possible to turn off just certain warnings for .pyx files – https://stackoverflow.com/questions/31269527/running-pep8-or-pylint-on-cython-code – though the full example file is a broken link.

Thanks for sharing this link. :) I can try to use the config specified in the answer here to check if all the undesired warnings/errors are turned off using it.

gojomo · 2017-06-29T18:26:12Z

While summing all the errors for a single call to train() is somewhat more useful, to a knowledgeable user, than the prior implementation, it's still not quite what users may find interpretable. (At least not compared to the running-average error printed during training by other tools like fasttext.) I also think it might need to be abs()d somewhere, lest offsetting errors in both directions be misleading when totaled. So I don't really see this as ready-for-merging, again preferring someone who actively needs the feature to review it.

piskvorky · 2017-06-30T00:53:19Z

@chinmayapancholi13 good progress!

What does mean and std mean in the table? Mean and std of what?

I'm most interested in seeing a performance comparison against the original. Comparison of the new switch on/off is interesting too, but we have to evaluate what we're losing by adding an extra switch to the critical path.

And like @gojomo says, it's best to run the benchmark against a larger corpus and multiple times, for robustness.

chinmayapancholi13 · 2017-06-30T02:12:44Z

@piskvorky mean and std refer to the average value and standard deviation of the time taken for training the word2vec model.

When we say a "larger corpus", could you please give an approx size? As I have mentioned in my comment here, I did this comparison between the old and new training times for 100 MB text8 corpus as well (but not for all the possible 8 combinations of compute_loss, sg and hs values) and I was getting similar results (i.e. the difference in time values was about the same percentage as the results posted for 50 MB test data).

gojomo · 2017-06-30T07:48:42Z

I'd find a couple tests that runs for tens-of-minutes better than many tests that run for a couple-minutes (or just tens-of-seconds). Increasing the iter to 20 or 40 might achieve that with text8. Also, if setting hs=1 should also set negative=0 (to match usual/recommended user choices of jsut one mode or the other – even though it is possible to run both at the same time and thus be benchmarking the combination).

chinmayapancholi13 · 2017-06-30T09:31:43Z

Got it, @gojomo. So to conclude, we should use the text8 corpus with iter set around 20-40 and compare the time taken by the new code with compute_loss set to True with the original code (without any switch). I'll update the benchmarks as soon as I have these values. :)

chinmayapancholi13 · 2017-07-01T02:31:29Z

@gojomo @piskvorky

Analysis of the extra time taken for training due to loss computation code

Theses values are for the full text8 corpus (100 MB) with iter=6.
Machine specs : 64 GB RAM, i-7 processor, 3.38 GHz (upto 3.6 GHz).

compute_loss	hs	sg	(mean, std) with switch	(mean, std) without switch
True	1	1	(341.065673, 0.722949)	-
False	1	1	(337.980214, 0.356567)	(338.080630, 0.541960)
True	0	1	(172.808942, 2.547450)	-
False	0	1	(171.416717, 1.786813)	(169.936202, 0.422062)
True	1	0	(103.503094, 0.930082)	-
False	1	0	(104.200277, 1.064672)	(102.959268, 1.517684)
True	0	0	(58.828162, 0.410273)	-
False	0	0	(58.280944, 0.300317)	(59.028026, 0.161999)

gojomo · 2017-07-01T02:57:39Z

The most interesting comparison, to determine the overhead for people who won't us the feature, would be between the old code (no changes present), and the new code with compute_loss=False.

chinmayapancholi13 · 2017-07-01T03:37:09Z

@gojomo I agree. I have updated the table to include these values too.

) * computes training loss for skip gram * synced word2vec.py with gensim_main * removed unnecessary keep_bocab_item import * synced word2vec.py with gensim_main * PEP8 changes * added Python-only implementation for skip-gram model * updated param name to 'compute_loss' * removed 'raise ImportError' statement from prev commit * [WIP] partial changes for loss computation for skipgram case * [WIP] updated cython code * added unit test for training loss computation * added loss computation for neg sampling * removed unnecessary 'raise ImportError' stmt * added .c and .pyx to flake8 ignore list * added loss computation for CBOW model in Python path * added loss computation for CBOW model in Cython path * PEP8 (F811) fix due to var 'prod' * updated w2v ipynb for training loss computation and benchmarking * updated .c files * added benchmark results

gojomo · 2017-07-13T19:54:31Z

AFAICT, whatever overhead there is from both the switch, or enabling the switch, disappears into the variance (in these short tests), so there's no evidence of a performance-related reason to hold back.

But for the reasons in my comment #1201 (comment) I still think this was merged (and #999 closed) prematurely, because there's no confirmation of usefulness from the people who have actually requested this.

Also, while the titles still mention "skip-gram", the benchmarking included non-skip-gram-modes... so does that mean what's been done works for CBOW, too?

chinmayapancholi13 · 2017-07-14T09:59:48Z

@gojomo Since the title #999 mentions "skip-gram", this PR was originally only for skip-gram mode but later I decided to implement both modes in the same PR. Thanks for pointing this out. :) I have updated the title of the PR now.

computes training loss for skip gram

a3b57f3

chinmayapancholi13 changed the title ~~computes training loss for skip gram~~ Computes training loss for skip gram model. Fixes issue #999. Mar 9, 2017

chinmayapancholi13 changed the title ~~Computes training loss for skip gram model. Fixes issue #999.~~ [WIP] Computes training loss for skip gram model. Fixes issue #999. Mar 16, 2017

chinmayapancholi13 added 4 commits April 29, 2017 13:06

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

0cfc672

…into develop

synced word2vec.py with gensim_main

501647c

removed unnecessary keep_bocab_item import

03fff61

synced word2vec.py with gensim_main

ed78b06

menshikh-iv reviewed May 8, 2017

View reviewed changes

chinmayapancholi13 added 5 commits May 15, 2017 09:15

Merge remote-tracking branch 'refs/remotes/origin/develop' into develop

dcd80f2

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

c455d18

…into develop

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

64ececd

…into word2vec_skipgram_loss

Merge branch 'word2vec_skipgram_loss' of https://github.com/chinmayap…

dcae99d

…ancholi13/gensim into word2vec_skipgram_loss

PEP8 changes

0939b32

added Python-only implementation for skip-gram model

8949749

chinmayapancholi13 added 5 commits May 24, 2017 11:53

updated param name to 'compute_loss'

d2620fd

removed 'raise ImportError' statement from prev commit

4d01f78

[WIP] partial changes for loss computation for skipgram case

3fdd2e9

[WIP] updated cython code

e0fc9f2

added unit test for training loss computation

ca4aa69

chinmayapancholi13 added 4 commits June 13, 2017 00:56

added loss computation for neg sampling

96f28fc

removed unnecessary 'raise ImportError' stmt

4a686de

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

5ab89b0

…into word2vec_skipgram_loss

added .c and .pyx to flake8 ignore list

c3db4fa

chinmayapancholi13 and others added 6 commits June 13, 2017 15:51

added loss computation for CBOW model in Python path

4e8ecac

added loss computation for CBOW model in Cython path

e71401a

PEP8 (F811) fix due to var 'prod'

b80e183

updated w2v ipynb for training loss computation and benchmarking

cc6e0ea

resolved merge conflict in 'flake8_diff.sh'

8c84680

updated .c files

dda1911

tmylk reviewed Jun 29, 2017

View reviewed changes

added benchmark results

0acd3d6

menshikh-iv merged commit cdc5944 into piskvorky:develop Jun 29, 2017

menshikh-iv mentioned this pull request Jun 29, 2017

Loss through each iteration in skip gram #999

Closed

gojomo reviewed Jun 29, 2017

View reviewed changes

chinmayapancholi13 changed the title ~~[WIP] Computes training loss for skip gram model. Fixes issue #999.~~ [WIP] Computes training loss for Word2Vec model (fixes issue #999) Jul 14, 2017

This was referenced Oct 2, 2017

Monitoring Training loss #1172

Closed

how to compute the loss? #686

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Computes training loss for Word2Vec model (fixes issue #999) #1201

[WIP] Computes training loss for Word2Vec model (fixes issue #999) #1201

chinmayapancholi13 commented Mar 9, 2017

piskvorky commented Mar 10, 2017

chinmayapancholi13 commented Mar 10, 2017

menshikh-iv May 8, 2017

chinmayapancholi13 May 8, 2017

menshikh-iv commented May 8, 2017

gojomo commented May 23, 2017

chinmayapancholi13 commented May 23, 2017

gojomo commented May 24, 2017

piskvorky commented Jun 13, 2017 •

edited

Loading

chinmayapancholi13 commented Jun 13, 2017

tmylk Jun 29, 2017

chinmayapancholi13 commented Jun 29, 2017 •

edited

Loading

gojomo commented Jun 29, 2017 •

edited

Loading

gojomo Jun 29, 2017

menshikh-iv Jun 29, 2017

chinmayapancholi13 Jun 29, 2017

gojomo Jun 30, 2017

chinmayapancholi13 Jun 30, 2017

gojomo commented Jun 29, 2017 •

edited

Loading

piskvorky commented Jun 30, 2017 •

edited

Loading

chinmayapancholi13 commented Jun 30, 2017

gojomo commented Jun 30, 2017

chinmayapancholi13 commented Jun 30, 2017

chinmayapancholi13 commented Jul 1, 2017 •

edited

Loading

gojomo commented Jul 1, 2017

chinmayapancholi13 commented Jul 1, 2017

gojomo commented Jul 13, 2017

chinmayapancholi13 commented Jul 14, 2017

[WIP] Computes training loss for Word2Vec model (fixes issue #999) #1201

[WIP] Computes training loss for Word2Vec model (fixes issue #999) #1201

Conversation

chinmayapancholi13 commented Mar 9, 2017

piskvorky commented Mar 10, 2017

chinmayapancholi13 commented Mar 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented May 8, 2017

gojomo commented May 23, 2017

chinmayapancholi13 commented May 23, 2017

gojomo commented May 24, 2017

piskvorky commented Jun 13, 2017 • edited Loading

chinmayapancholi13 commented Jun 13, 2017

Choose a reason for hiding this comment

chinmayapancholi13 commented Jun 29, 2017 • edited Loading

gojomo commented Jun 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gojomo commented Jun 29, 2017 • edited Loading

piskvorky commented Jun 30, 2017 • edited Loading

chinmayapancholi13 commented Jun 30, 2017

gojomo commented Jun 30, 2017

chinmayapancholi13 commented Jun 30, 2017

chinmayapancholi13 commented Jul 1, 2017 • edited Loading

gojomo commented Jul 1, 2017

chinmayapancholi13 commented Jul 1, 2017

gojomo commented Jul 13, 2017

chinmayapancholi13 commented Jul 14, 2017

piskvorky commented Jun 13, 2017 •

edited

Loading

chinmayapancholi13 commented Jun 29, 2017 •

edited

Loading

gojomo commented Jun 29, 2017 •

edited

Loading

gojomo commented Jun 29, 2017 •

edited

Loading

piskvorky commented Jun 30, 2017 •

edited

Loading

chinmayapancholi13 commented Jul 1, 2017 •

edited

Loading