Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12685] [MLlib] word2vec trainWordsCount gets overflow #10627

Closed
wants to merge 4 commits into from

Conversation

hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Jan 7, 2016

jira: https://issues.apache.org/jira/browse/SPARK-12685
the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48896 has finished for PR 10627 at commit e2d8387.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

This change looks good. Should we also fix the initialization around line 335: val model = iter.foldLeft((bcSyn0Global.value, bcSyn1Global.value, 0, 0))?

@SparkQA
Copy link

SparkQA commented Jan 11, 2016

Test build #49130 has finished for PR 10627 at commit f8901b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks!

Will backport in other PRs

@asfgit asfgit closed this in 4f8eefa Jan 11, 2016
asfgit pushed a commit that referenced this pull request Jan 13, 2016
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #10721 from hhbyyh/branch-1.4.
asfgit pushed a commit that referenced this pull request Jan 13, 2016
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #10721 from hhbyyh/branch-1.4.

(cherry picked from commit 7bd2564)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
asfgit pushed a commit that referenced this pull request Jan 13, 2016
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #10721 from hhbyyh/branch-1.4.

(cherry picked from commit 7bd2564)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants