Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14298] [ML] [MLlib] LDA should support disable checkpoint #12089

Closed
wants to merge 2 commits into from

Conversation

yanboliang
Copy link
Contributor

What changes were proposed in this pull request?

In the doc of checkpointInterval, we told users that they can disable checkpoint by setting checkpointInterval = -1. But we did not handle this situation for LDA actually, we should fix this bug.

How was this patch tested?

Existing tests.

cc @jkbradley

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54636 has finished for PR 12089 at commit a5c0343.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Thanks for catching this. I'd prefer we make the change in PeriodicCheckpointer. That way, LDA with EM will still call PeriodicCheckpointer, which can handle persistence even if checkpointing is turned off.

@yanboliang
Copy link
Contributor Author

@jkbradley Agree, I will update the PR soon.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55213 has finished for PR 12089 at commit 4af96d8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55214 has finished for PR 12089 at commit 4af96d8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55310 has finished for PR 12089 at commit 4af96d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

@jkbradley It's ready for review now. Thanks!

@jkbradley
Copy link
Member

Thanks for updating it!
LGTM
Merging with master

Since this was a bug from before, let's add a unit test. It should be easy to write a unit test now that we keep the last checkpoint by default. However, that test will be hard to backport to earlier Spark versions. Let's do this:

  • I'll merge this now with master.
  • Can you please send a follow-up PR with a unit test?
  • After merging the unit test, I'll backport this PR to previous Spark versions.

@yanboliang
Copy link
Contributor Author

@jkbradley Sent #12286 for unit test, please help to review. Thanks!

asfgit pushed a commit that referenced this pull request Apr 11, 2016
## What changes were proposed in this pull request?
This is follow up for #12089, add unit test for EM LDA which test disable checkpointing when set ```checkpointInterval = -1```.
## How was this patch tested?
unit test.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12286 from yanboliang/spark-14298-followup.
@jkbradley
Copy link
Member

Backported to 1.6 and 1.5 as well now

asfgit pushed a commit that referenced this pull request Apr 11, 2016
## What changes were proposed in this pull request?
In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug.
## How was this patch tested?
Existing tests.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12089 from yanboliang/spark-14298.

(cherry picked from commit 56af8e8)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
asfgit pushed a commit that referenced this pull request Apr 11, 2016
## What changes were proposed in this pull request?
In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug.
## How was this patch tested?
Existing tests.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12089 from yanboliang/spark-14298.

(cherry picked from commit 56af8e8)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 12, 2016
## What changes were proposed in this pull request?
In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug.
## How was this patch tested?
Existing tests.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#12089 from yanboliang/spark-14298.

(cherry picked from commit 56af8e8)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
(cherry picked from commit 05dbc28)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants