-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat sicedar: added readiness prober #1395
Conversation
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
e460685
to
3b2d411
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions (: Thanks overall!
cmd/thanos/sidecar.go
Outdated
func registerSidecar(m map[string]setupFunc, app *kingpin.Application, name string) { | ||
cmd := app.Command(name, "sidecar for Prometheus server") | ||
func registerSidecar(m map[string]setupFunc, app *kingpin.Application) { | ||
comp := component.Sidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe inline those, would be more verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined it in the registerSidecar
function, not sure if you want to inline it everywhere even in the runSidecar
?
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
3eaf82a
to
cdc30f4
Compare
@povilasv thanks! there are some new changes just so you don't approve something you didn't see so maybe check that out :) @bwplotka Thanks for the suggestions! I tried some refactor of the sidecar liveness so PTAL. |
those failing test are weird, not sure if related to my changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it generally make sense.. I am bit worried that those values for sinceLastSuccessfulHeartbeatLimit
might be generally nice.. but might not match some unique use cases user may have. Also this means any flake on connection to Prometheus (single heartbeat failed) might mean unavailability. I am a bit skeptical, I would mark it as non ready after 3 heartbeats or something (?)
Let's chat offline @FUSAKLA but overall problem with those probes is that we need to be careful. @povilasv mentioned that all of this can be configured anyone on Kubernetes side, but that's not that simple. Problems:
|
Regarding healthyness: https://twitter.com/miekg/status/1152911058627153920 We are I think on the same page that we should do it ONLY set healthyness to NOT healthy if we are sure that restart will help. |
Yeah, so should we get back to the original state of just modifying the readiness based on the halthchecks? I wanted originally to avoid touching the liveness probe because it felt bit harsh but I understood that this was what you suggested. But I probably did not clearly understood what you meant, sorry for the confusement 😕 |
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
326450a
to
153cd3f
Compare
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
35633e7
to
d351d03
Compare
Along with the offline agreement, the readiness probe is only set after the initial Prometheus external labels fetch from the Prometheus. Once they are fetched it stays ready all the time. If we hit any case in the future when more complex readiness handling will be needed it can be added then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good 👍
One important thing before merge. We never set prober to be healthy. And in prober constructor we set healthyness to be initialError
. Maybe we should be healthy by default in NewProber
?
@bwplotka this is done in the Lines 362 to 369 in 0a30a2e
|
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
But I found one bug with opposite condition on error logging in the Prober, fixed in 60897a8 🙄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/prober/prober.go
Outdated
@@ -37,6 +37,19 @@ type Prober struct { | |||
func NewProber(component component.Component, logger log.Logger, reg prometheus.Registerer) *Prober { | |||
initialErr := fmt.Errorf(initialErrorFmt, component) | |||
|
|||
// From Kubernetes documentation https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but this should part of struct TBH (for GoDocs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, is it ok this way?
79d3507
to
480fff5
Compare
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
480fff5
to
210075a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @FUSAKLA ! |
Great, thank you all for the comments and going through the discussions around the readiness! There is only few components left 😄 |
* feat sicedar: added readiness prober Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Martin Chodur m.chodur@seznam.cz
Changes
Next PR in the from the series of adding readiness and liveness probes.
Added readiness and liveness prober to the thanos sidecar.
It follows the logic of setting the
promUp
metric so when it successfully fetches config of the Prometheus it becomes ready.The question is should it take into account status of the GRPC interface also?
Verification