-
Notifications
You must be signed in to change notification settings - Fork 841
Fixes #2509 by listening for connection drops and abdicating directly #2712
Conversation
1af1487
to
a688f8a
Compare
|
||
// The current candidate implementation does not handle ZK connection loss | ||
// We register a listener on the candidate zk connection and handle disconnetions explicitly | ||
zk.get().register(new Watcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to move this out of the Guice logic.
It also looks strange to me that you couple this with creating the MarathonSchedulerService.
Couldn't that simply be inside a WhenLeaderActor
that has the ZooKeeperClient
and the MarathonSchedulerService
as dependencies?
@aquamatthias Thanks for taking care of this! I don't like placing the logic in the guice module, though... |
a688f8a
to
faa5490
Compare
@kolloch I agree. I have created a separate actor that handles this. |
processes.get(name).map { process => | ||
Try(process.destroy()) | ||
processes -= name | ||
process.exitValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this affected by the race conditions mentioned in L188?
It looks odd that stopAllProcesses actually doesn't call this method for each process.
LGTM, just a minor comment regarding one of the new tests. |
@gkleiman Addressed your comment. |
@kolloch Since Gaston is on vacation - can you review? |
} | ||
} | ||
|
||
class AbdicateOnConnectionLossActor(zk: ZooKeeperClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: could be private[impl]
} | ||
|
||
When("Zookeeper starts again") | ||
ProcessKeeper.startZooKeeper(config.zkPort, "/tmp/foo/single", wipeWorkDir = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using a different working directory here? Please clarify in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same path as in SingleMarathonIntegrationTest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move it to a constant in SingleMarathonIntegrationTest to make that clear?
Hey @aquamatthias, looks good. Mostly minor nitpicks and one race-condition. The only major thing is that I am not sure if the error detection code is exhaustive. |
Thanks for the changes. I would suggest moving that Zookeeper path to a constant and referencing it in the test. Otherwise I would keep being confused about the meaning of that path in the test. Otherwise looks good! |
090abf3
to
d7954c1
Compare
Extracted a def with parameter default values. |
|
Fixes #2509 by listening for connection drops and abdicating directly
No description provided.