Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Don't assume the zk client hasn't yet connected #15

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

jdanbrown
Copy link
Contributor

If the user supplies their own zk client to cluster.join (or cluster.connect), and their zk client has already established a session, then the cluster will register a watch for a SyncConnected event but never receive one, since it has already fired long ago, before the cluster started listening.

This bug was introduced in my earlier #13, which was a simple change that quietly violated the assumption that the cluster's zk client would trigger a SyncConnected after cluster.join.

This fix depends on (and includes) #14.

@test methods apparently need to be nested inside at least one inner
class or else they'll be ignored... *shrug*
If the user supplies their own zk client to cluster.join (or
cluster.connect), and their zk client has already established a session,
then the cluster will register a watch for a SyncConnected event but
never receive one, since it has already fired long ago, before the
cluster started listening.
@jdanbrown
Copy link
Contributor Author

Actually, this fix is unsafe because the access to onConnect isn't synchronized across the user thread and the zk event thread. I'll revisit this in the morning...

@eribeiro
Copy link
Contributor

Yeah ... right. ¯(°_o)/¯

@jdanbrown
Copy link
Contributor Author

Ok, I just went ahead and synchronized all of connectionWatcher.process so that the zk event thread and user thread won't collide. Let me know what you think—I still can't decide whether reusing connectionWatcher.process in the first place is a dirty hack or a simple and elegant solution...

@eribeiro
Copy link
Contributor

Yes, the use of connectionWatcher.process smells like a dirty hack. But if the simplest solution that works, that's fine.

  • Why don't you extract lines 120-129 to its own method and call it from the user and zk thread? Call it instead of simulating a event connection.
  • It seems that onConnect() is the method that requires synchronization now, and not whole connectionWatcher() so why not move the synchronization block there? Can you reduce the synchronization area without loosing the concurrency correctness.

Finally, you should step back to evaluate if the cost of synchronizing all access to connectionWatcher() is worth just to have the client issue its own zookeeper client. Sorry, I still don't see a good reason for this besides testing or saving a connection. The lock synchronization will reduce the throughput so you should have a very good reason to this. Maybe I am just plain dumb, but the only reason I came up with for this change was enable testing (passing a mock object) and re-use the same zookeeper connection to store/load ordasity data and other unrelated app data.

@jdanbrown
Copy link
Contributor Author

This fix is unsafe with com.twitter.common:zookeeper:0.1.1 and should only be used with an updated version of ZooKeeperClient. See comment #8 (comment) for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants