-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent crashes when API server is not available #222
Comments
Ideally, the agent should be alive for about 5mins before anything extreme like crashing. 5mins is for instance the standard eviction timeout duration for a node: https://kubernetes.io/docs/concepts/architecture/nodes/#condition Another idea is to retry a fixed number of times, maybe with something like exponential backoff. Krustlet does something similar when it needs to register itself as new node to the apiserver: https://github.com/deislabs/krustlet/blob/master/crates/kubelet/src/node/mod.rs#L80 |
Issue has been automatically marked as stale due to inactivity for 45 days. Update the issue to remove label, otherwise it will be automatically closed. |
@kate-goldenring did #374 resolve this issue? Should we close it? |
@adithyaj I am not sure we ever tested if it resolved it but i think it is safe to close this and reopen it if someone else experiences it still |
This issue is still not resolved according to #557 |
From what I understand of the
So my guess is that this construct is at fault:
When doing this we are exiting the loop in case of an error probably panic, we should check for error within the loop and handle it locally (maybe also have a backoff mechanism). I'll try to do a PR for this. |
Agent crashes when API server is not available.
Here is an example:
Consider using the cached configuration instead of crashing.
The text was updated successfully, but these errors were encountered: