Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global EventNotWatching events for session lost #34

Merged
merged 1 commit into from
Nov 6, 2020

Conversation

dragonsinth
Copy link

All other events are published to the global event callback; but lost session events are not. These events are essential to correctly implement cross-cutting wrapper frameworks that do things like re-add watches when a new session is obtained.

@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

Merging #34 into master will decrease coverage by 0.51%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
- Coverage   77.58%   77.06%   -0.52%     
==========================================
  Files           7        7              
  Lines        1316     1317       +1     
==========================================
- Hits         1021     1015       -6     
- Misses        205      211       +6     
- Partials       90       91       +1     
Impacted Files Coverage Δ
conn.go 73.66% <100.00%> (-0.96%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50daf81...609e469. Read the comment docs.

@dragonsinth
Copy link
Author

@pmazzini hi! just wondered if you might have a chance to give us some feedback on this and #35

We recently migrated to your fork as it seems the most active. :).

Cheers!
Scott

@pmazzini
Copy link

Sorry for the late review, I will have a look at this one today.

#35 already has some feedback which I agree with.

There is no "one" owner of this fork.

@dragonsinth
Copy link
Author

Thank you!

@pmazzini
Copy link

All other events are published to the global event callback; but lost session events are not.

All session state changes are sent to the global even channel.

For example when a session expires: https://github.com/go-zookeeper/zk/blob/master/conn.go#L677

The EventNotWatching one is not a session state change.

Is it really needed to send it to the global event channel?

@dragonsinth
Copy link
Author

This one is for individual watches being lost. Our use case is a "tree watcher" style helper library -- (https://github.com/fullstorydev/gosolr/blob/master/solrmonitor/zkwatcherman.go) that tries to avoid duplicating state in memory that the zk client is already holding.

Instead of tracking a billion individual channels and using a massive (and unscalable) reflective select over them, we're using the global event channel so we can listen on a single channel and get all events. This works 99% perfectly-- all events sent to individual watch channels are also sent to the global channel. Except for this one case! For our use, it's not enough to get a single global event for session lost-- we need to know the paths of all the lost watches.

Without this event, we'd have to duplicate the map of watched nodes in our wrapper helper, which is a waste of memory.

@dragonsinth
Copy link
Author

Wanted to ping on this and see if we could drive this forward. This feature is vital to being able to implement a high-scale watcher that can reestablish watches after a ZK session lost.

@pmazzini
Copy link

pmazzini commented Nov 5, 2020

Apologies for the late review. The change lgtm, an additional review is welcomed though.

These events are essential to correctly implement cross-cutting wrapper frameworks that do things like re-add watches when a new session is obtained.

I wonder what the original client's behavior is. Shouldn't they be re added by the client library?

@dragonsinth
Copy link
Author

Apologies for the late review. The change lgtm, an additional review is welcomed though.

Awesome!

These events are essential to correctly implement cross-cutting wrapper frameworks that do things like re-add watches when a new session is obtained.

I wonder what the original client's behavior is. Shouldn't they be re added by the client library?

If you're talking about zk.Conn, its contract is essentially exactly-once watch event delivery. Since EventNotWatching is an event that occurs when the session is lost, zk.Conn considers the watch fires and does not try to re-establish the watch when the session is regained.

We wrote a little library that sits on top of zk.Conn and handles the task of watching certain ZK nodes, pushing updates out to the app, and re-establishing watches whenever an event fires. This library use to use the individual <-chan zk.Event channels that come back from e.g. GetW. But it turns out this doesn't scale-- you need either a parked goroutine per watch, or else a massive dynamic selective reflect, neither of which scales well past the 10k watch range. By contrast, dedicating a zk.Conn and watching the global event channel works very well! We were just missing this one event.

@pmazzini
Copy link

pmazzini commented Nov 5, 2020

If you're talking about zk.Conn

No, I was talking about the Java original client or how other clients deal with watchers on reconnect.

@dragonsinth
Copy link
Author

If you're talking about zk.Conn

No, I was talking about the Java original client or how other clients deal with watchers on reconnect.

Oh, good question. I haven't used the Java ZK client in a long time.

It looks like the Java client only has global watch event callbacks. Its API is not like Go, it doesn't offer per-method watch callbacks.

@dragonsinth
Copy link
Author

https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java#L130

When a client drops the current connection and re-connects to a server, all the existing watches are considered as being triggered but the undelivered events are lost. To emulate this, the client will generate a special event to tell the event handler a connection has been dropped. This special event has EventType None and KeeperState Disconnected.

So this PR should actually bring Go closer to Java in that regard!

@pmazzini pmazzini requested a review from yarikk November 6, 2020 09:11
Copy link

@yarikk yarikk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yarikk yarikk merged commit 54f4812 into go-zookeeper:master Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants