[SPARK-12685] [MLlib] word2vec trainWordsCount gets overflow #10627

hhbyyh · 2016-01-07T04:35:18Z

jira: https://issues.apache.org/jira/browse/SPARK-12685
the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

SparkQA · 2016-01-07T05:29:00Z

Test build #48896 has finished for PR 10627 at commit e2d8387.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-08T19:43:28Z

This change looks good. Should we also fix the initialization around line 335: val model = iter.foldLeft((bcSyn0Global.value, bcSyn1Global.value, 0, 0))?

SparkQA · 2016-01-11T10:35:49Z

Test build #49130 has finished for PR 10627 at commit f8901b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-11T22:47:46Z

LGTM
Merging with master
Thanks!

Will backport in other PRs

…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10721 from hhbyyh/branch-1.4.

…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

fix overflow in word2vec

e2d8387

hhbyyh added 3 commits January 11, 2016 16:46

Merge remote-tracking branch 'upstream/master' into w2voverflow

67561e2

Merge remote-tracking branch 'upstream/master' into w2voverflow

8172a80

change wordCount to Long

f8901b6

asfgit closed this in 4f8eefa Jan 11, 2016

hhbyyh mentioned this pull request Jan 12, 2016

[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12685] [MLlib] word2vec trainWordsCount gets overflow #10627

[SPARK-12685] [MLlib] word2vec trainWordsCount gets overflow #10627

hhbyyh commented Jan 7, 2016

SparkQA commented Jan 7, 2016

jkbradley commented Jan 8, 2016

SparkQA commented Jan 11, 2016

jkbradley commented Jan 11, 2016

[SPARK-12685] [MLlib] word2vec trainWordsCount gets overflow #10627

[SPARK-12685] [MLlib] word2vec trainWordsCount gets overflow #10627

Conversation

hhbyyh commented Jan 7, 2016

SparkQA commented Jan 7, 2016

jkbradley commented Jan 8, 2016

SparkQA commented Jan 11, 2016

jkbradley commented Jan 11, 2016