-
-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backoff to watcher reconnection #577
Comments
Pasting comments here for people off discord:
Do we think the solution here is to try to bake an exponential backoff (btw one way I have gotten around on this in the |
I think it would make sense to just have |
If that's the case then I think we should consider creating builders for |
rewording a bit; the I do still think there's room for a |
To be honest, I don't really see the problem with the import, Rust has been smart enough to do the right thing in all the cases I've tried. |
Rustdoc needs explicit disambiguation when linking to either module or the fn. Generally i just think we should organise it in the way that is less confusing: -use runtime::{reflector, reflector::Store};
+use runtime::cache::{reflector, Store}; relatively minor point though. |
That makes sense to me. |
moves the imports of kube_runtime reflector around a bit first as per comments in #577 then implements a slightly more ergonomic builder struct `Cache<K>` that wraps `reflector`. This encapsulates the writer and result unpacking within the struct, and creates 3 ways to consume it: - `Cache::run(self)` - if wanting to run forever - `Cache::applies(self)` - also exposes flattened applies stream - `Cache::touches(self)` - also exposes flattened touches stream Signed-off-by: clux <sszynrae@gmail.com>
Coming back to this after thinking more about it in #698 If we put a if RBAC is wrong, then no amount of retrying should occur - we need to bubble this up from the I can't think of any other non-ephemeral issues that can cause a watcher retry loop to happen, but if there are we should also catch them. So, what we should probably try to do:
The last point here because most users thing |
I'm not so sure that we can make a hard distinction here. For example:
And so on, and so forth. Ultimately, we need to percolate the problem up to the administrator and let them handle the problem. And even if it is a human-caused misconfiguration error, it'd still be useful to retry since we probably won't get notified once the issue is resolved. |
True. These configuration errors can happen, and it's nice if we can just keep retrying through it. But it's also important for us to provide that first signal that something has gone wrong upstream (like in the load balancer policy example - if we silently try to retry for that, then we'll never get correct percolation). I checked
So these would be up to users to handle atm. Pre- But this means that means the RBAC retry we get from If we inject backoffs into |
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
* implement backoff for watcher - for #577 Signed-off-by: clux <sszynrae@gmail.com> * move magic number into strategy Signed-off-by: clux <sszynrae@gmail.com> * expose backoff from watcher and semi-propagate into controller awkward. will write a comment Signed-off-by: clux <sszynrae@gmail.com> * potential abstraction Signed-off-by: clux <sszynrae@gmail.com> * another builder layer; allow eliding ListParams Signed-off-by: clux <sszynrae@gmail.com> * forgot to add file Signed-off-by: clux <sszynrae@gmail.com> * easy parts of code review Signed-off-by: clux <sszynrae@gmail.com> * rewrite as a helper (take N) jesus this stuff is hard. Signed-off-by: clux <sszynrae@gmail.com> * rename as suggested Signed-off-by: clux <sszynrae@gmail.com> * Reimplement watcher backoff as FSM (#720) * Fix clippy warnings Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Reimplement watch backoff as FSM Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Remove useless lifetime bounds Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Silence clippy size warning Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Silence clippy properly this time around Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Split StreamBackoff into a separate utils module Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Backoff tests Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Add stream close test Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * remove backoff pin, fix docs Signed-off-by: clux <sszynrae@gmail.com> * newline Signed-off-by: clux <sszynrae@gmail.com> * Add `Backoff` wrapper that implements client-go's reset timer behaviour (#729) Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * use new reset backoff and replicate client-go reflector values Signed-off-by: clux <sszynrae@gmail.com> * fix node watcher example Signed-off-by: clux <sszynrae@gmail.com> * Use released `backoff` Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Factor out default `Backoff` Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Add note to `watcher` about backoff Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Added backoff to Controller Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Changelog Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Revert `Observer` for now Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * The clippyman comes for us all, eventually And we must all pay our due respects, or pay the price. Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * Fix build warnings Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se> * remove backoff_watch Signed-off-by: clux <sszynrae@gmail.com> * doc tweaks Signed-off-by: clux <sszynrae@gmail.com> * sentence Signed-off-by: clux <sszynrae@gmail.com> * upgrading backoff is not actually breaking Signed-off-by: clux <sszynrae@gmail.com> Co-authored-by: Teo Klestrup Röijezon <teo.roijezon@stackable.de> Co-authored-by: Teo Klestrup Röijezon <teo@nullable.se>
Currently
watcher
will always keep a reconnection attempt in-flight when polled (and there is no active connection already). That's a bit excessive, and pretty unkind to the apiserver.This isn't too bad for
watcher
itself since you can add a delay on the consumer side instead, but we should at least add something forController
...The text was updated successfully, but these errors were encountered: