-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14298] [ML] [MLlib] LDA should support disable checkpoint #12089
Conversation
Test build #54636 has finished for PR 12089 at commit
|
Thanks for catching this. I'd prefer we make the change in PeriodicCheckpointer. That way, LDA with EM will still call PeriodicCheckpointer, which can handle persistence even if checkpointing is turned off. |
@jkbradley Agree, I will update the PR soon. |
Test build #55213 has finished for PR 12089 at commit
|
Test build #55214 has finished for PR 12089 at commit
|
Jenkins, test this please. |
Test build #55310 has finished for PR 12089 at commit
|
@jkbradley It's ready for review now. Thanks! |
Thanks for updating it! Since this was a bug from before, let's add a unit test. It should be easy to write a unit test now that we keep the last checkpoint by default. However, that test will be hard to backport to earlier Spark versions. Let's do this:
|
@jkbradley Sent #12286 for unit test, please help to review. Thanks! |
## What changes were proposed in this pull request? This is follow up for #12089, add unit test for EM LDA which test disable checkpointing when set ```checkpointInterval = -1```. ## How was this patch tested? unit test. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #12286 from yanboliang/spark-14298-followup.
Backported to 1.6 and 1.5 as well now |
## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug. ## How was this patch tested? Existing tests. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #12089 from yanboliang/spark-14298. (cherry picked from commit 56af8e8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug. ## How was this patch tested? Existing tests. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #12089 from yanboliang/spark-14298. (cherry picked from commit 56af8e8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug. ## How was this patch tested? Existing tests. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#12089 from yanboliang/spark-14298. (cherry picked from commit 56af8e8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com> (cherry picked from commit 05dbc28)
What changes were proposed in this pull request?
In the doc of
checkpointInterval
, we told users that they can disable checkpoint by settingcheckpointInterval = -1
. But we did not handle this situation for LDA actually, we should fix this bug.How was this patch tested?
Existing tests.
cc @jkbradley