Fix: Don't assume the zk client hasn't yet connected #15

jdanbrown · 2012-08-16T23:52:28Z

If the user supplies their own zk client to cluster.join (or cluster.connect), and their zk client has already established a session, then the cluster will register a watch for a SyncConnected event but never receive one, since it has already fired long ago, before the cluster started listening.

This bug was introduced in my earlier #13, which was a simple change that quietly violated the assumption that the cluster's zk client would trigger a SyncConnected after cluster.join.

This fix depends on (and includes) #14.

Was triggering a compiler warning

@test

@test methods apparently need to be nested inside at least one inner class or else they'll be ignored... *shrug*

If the user supplies their own zk client to cluster.join (or cluster.connect), and their zk client has already established a session, then the cluster will register a watch for a SyncConnected event but never receive one, since it has already fired long ago, before the cluster started listening.

jdanbrown · 2012-08-17T03:24:48Z

Actually, this fix is unsafe because the access to onConnect isn't synchronized across the user thread and the zk event thread. I'll revisit this in the morning...

eribeiro · 2012-08-17T04:03:12Z

Yeah ... right. ¯(°_o)/¯

…user thread

jdanbrown · 2012-08-17T13:09:49Z

Ok, I just went ahead and synchronized all of connectionWatcher.process so that the zk event thread and user thread won't collide. Let me know what you think—I still can't decide whether reusing connectionWatcher.process in the first place is a dirty hack or a simple and elegant solution...

eribeiro · 2012-08-17T18:56:15Z

Yes, the use of connectionWatcher.process smells like a dirty hack. But if the simplest solution that works, that's fine.

Why don't you extract lines 120-129 to its own method and call it from the user and zk thread? Call it instead of simulating a event connection.
It seems that onConnect() is the method that requires synchronization now, and not whole connectionWatcher() so why not move the synchronization block there? Can you reduce the synchronization area without loosing the concurrency correctness.

Finally, you should step back to evaluate if the cost of synchronizing all access to connectionWatcher() is worth just to have the client issue its own zookeeper client. Sorry, I still don't see a good reason for this besides testing or saving a connection. The lock synchronization will reduce the throughput so you should have a very good reason to this. Maybe I am just plain dumb, but the only reason I came up with for this change was enable testing (passing a mock object) and re-use the same zookeeper connection to store/load ordasity data and other unrelated app data.

jdanbrown · 2012-09-18T22:52:50Z

This fix is unsafe with com.twitter.common:zookeeper:0.1.1 and should only be used with an updated version of ZooKeeperClient. See comment #8 (comment) for details.

jdanbrown added 10 commits August 16, 2012 18:24

ClusterSpec: Remove type args in dynamic cast that have no effect

6b75186

Was triggering a compiler warning

ClusterSpec: Re-scope a handful of (broken) tests that weren't being run

8a07de4

@test methods apparently need to be nested inside at least one inner class or else they'll be ignored... *shrug*

ClusterSpec: Twiddle comment indentation

656b0fa

ClusterSpec: Uncomment and fix broken 'connect' test

cdf02be

ClusterSpec: Split 'join' -> 'join when draining/started/fresh'

b5c0475

ClusterSpec: Remove 'set' calls that are never observed

efcb9ff

ClusterSpec: Fix broken 'join' tests

2b03937

ClusterSpec: Reuse common setup among 'join' tests

05a8395

Cluster.connect: Clarify logging

3222c5a

Cluster: Synchronize connectionWatcher.process for zk event thread + …

629e87e

…user thread

Cluster.connect: Ignore config.hosts on user-supplied zk

e33cc9d

jdanbrown mentioned this pull request Sep 18, 2012

What is com.twitter.commons:zookeeper:0.1.1? #8

Closed

Cluster.connect: More helpful logging

b48f4b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Don't assume the zk client hasn't yet connected #15

Fix: Don't assume the zk client hasn't yet connected #15

jdanbrown commented Aug 16, 2012

jdanbrown commented Aug 17, 2012

eribeiro commented Aug 17, 2012

jdanbrown commented Aug 17, 2012

eribeiro commented Aug 17, 2012

jdanbrown commented Sep 18, 2012

Fix: Don't assume the zk client hasn't yet connected #15

Are you sure you want to change the base?

Fix: Don't assume the zk client hasn't yet connected #15

Conversation

jdanbrown commented Aug 16, 2012

jdanbrown commented Aug 17, 2012

eribeiro commented Aug 17, 2012

jdanbrown commented Aug 17, 2012

eribeiro commented Aug 17, 2012

jdanbrown commented Sep 18, 2012