-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: return a non-zero exit code on error #9670
Conversation
* agent: use oklog's run.Group to schedule subsystem runners * agent: clean up unused DoneCh, clean up agent's main Run func * agent/template: use ts.stopped.CAS to atomically swap value * fix tests * fix tests * agent/template: add timeout on TestRunServer * agent: output error via logs and return a generic error on non-zero exit * fix TestAgent_ExitAfterAuth * agent/template: do not restart ct runner on new incoming token if exit_after_auth is set to true
# Conflicts: # command/agent/cert_with_name_end_to_end_test.go # command/agent/cert_with_no_name_end_to_end_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this isn't part of your PR, but I was looking to see where ctx
was set in the end to end tests and came across this:
ctx, cancelFunc := context.WithCancel(context.Background())
timer := time.AfterFunc(30*time.Second, func() {
cancelFunc()
})
defer timer.Stop()
Could you update this to:
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
(also in the other end-to-end tests, not just the one I linked above)
Thanks!
if ts.testingLimitRetry != 0 { | ||
ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the conditional:
https://github.com/hashicorp/consul-template/blob/7a4683109607e642f014f29898b1acc797efb890/config/retry.go#L46
https://github.com/hashicorp/consul-template/blob/master/config/vault.go#L196-L198
if ts.testingLimitRetry != 0 { | |
ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry} | |
} | |
ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value prior to this change is based off DefaultRetryAttempts
, which is 12 retries and set by Finalize. We don't want to change this to be unlimited by default. Setting this to 0 re-introduces the behavior that we are trying to avoid since we want to give up and exit 1 after the default retry of 12 attempts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and works as described in K8s.
This PR introduces changes around agent to properly return a non-zero exit code on error. Particularly, we can now return a proper error if agent template fails to render any templates due to a non-existent secret path or a non-existent key within a secret path. Using
error_on_missing_key
(same as the config within consul-template) on any of the templates stanza will result in agent returning immediately if a template-related error is encountered.In order for agent to properly exit for cases such as an error due to a non-existent secret path, the current behavior of indefinite retry is changed to exit after it's max default retry of 12 attempts (with an exponential backoff of 250ms increase per try, up to 1m). This still results in a reasonable number of attempts before agent gives up and exits.
I also took the opportunity to eliminate possible panics by changing them to errors since we can now gracefully handle errors returned by the subsystems' Run function within the agent's main Run function.
Fixes #9306