Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: return a non-zero exit code on error #9670

Merged
merged 10 commits into from
Sep 30, 2020
Merged

agent: return a non-zero exit code on error #9670

merged 10 commits into from
Sep 30, 2020

Conversation

calvn
Copy link
Contributor

@calvn calvn commented Aug 5, 2020

This PR introduces changes around agent to properly return a non-zero exit code on error. Particularly, we can now return a proper error if agent template fails to render any templates due to a non-existent secret path or a non-existent key within a secret path. Using error_on_missing_key (same as the config within consul-template) on any of the templates stanza will result in agent returning immediately if a template-related error is encountered.

In order for agent to properly exit for cases such as an error due to a non-existent secret path, the current behavior of indefinite retry is changed to exit after it's max default retry of 12 attempts (with an exponential backoff of 250ms increase per try, up to 1m). This still results in a reasonable number of attempts before agent gives up and exits.

I also took the opportunity to eliminate possible panics by changing them to errors since we can now gracefully handle errors returned by the subsystems' Run function within the agent's main Run function.

Fixes #9306

@calvn calvn added this to the 1.5.1 milestone Aug 5, 2020
@calvn calvn requested a review from catsby August 5, 2020 21:44
@calvn calvn marked this pull request as ready for review August 6, 2020 19:12
@calvn calvn requested a review from a team August 7, 2020 22:06
command/agent.go Outdated Show resolved Hide resolved
command/agent/auth/auth.go Outdated Show resolved Hide resolved
command/agent/cache_end_to_end_test.go Show resolved Hide resolved
command/agent/template/template.go Show resolved Hide resolved
command/agent/template/template_test.go Outdated Show resolved Hide resolved
command/agent.go Outdated Show resolved Hide resolved
* agent: use oklog's run.Group to schedule subsystem runners

* agent: clean up unused DoneCh, clean up agent's main Run func

* agent/template: use ts.stopped.CAS to atomically swap value

* fix tests

* fix tests

* agent/template: add timeout on TestRunServer

* agent: output error via logs and return a generic error on non-zero exit

* fix TestAgent_ExitAfterAuth

* agent/template: do not restart ct runner on new incoming token if exit_after_auth is set to true
@calvn calvn modified the milestones: 1.5.1, 1.5.2, 1.5.4 Aug 19, 2020
# Conflicts:
#	command/agent/cert_with_name_end_to_end_test.go
#	command/agent/cert_with_no_name_end_to_end_test.go
@kalafut kalafut modified the milestones: 1.5.4, 1.6 Sep 15, 2020
Copy link
Contributor

@pcman312 pcman312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't part of your PR, but I was looking to see where ctx was set in the end to end tests and came across this:

ctx, cancelFunc := context.WithCancel(context.Background())
timer := time.AfterFunc(30*time.Second, func() {
	cancelFunc()
})
defer timer.Stop()

Could you update this to:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

(also in the other end-to-end tests, not just the one I linked above)

Thanks!

Comment on lines +169 to +171
if ts.testingLimitRetry != 0 {
ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the conditional:
https://github.com/hashicorp/consul-template/blob/7a4683109607e642f014f29898b1acc797efb890/config/retry.go#L46
https://github.com/hashicorp/consul-template/blob/master/config/vault.go#L196-L198

Suggested change
if ts.testingLimitRetry != 0 {
ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry}
}
ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry}

Copy link
Contributor Author

@calvn calvn Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value prior to this change is based off DefaultRetryAttempts, which is 12 retries and set by Finalize. We don't want to change this to be unlimited by default. Setting this to 0 re-introduces the behavior that we are trying to avoid since we want to give up and exit 1 after the default retry of 12 attempts.

Copy link
Contributor

@jasonodonnell jasonodonnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and works as described in K8s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vault agent not setting exit code to 1 on templating error
6 participants