agent: return a non-zero exit code on error #9670

calvn · 2020-08-05T21:44:24Z

This PR introduces changes around agent to properly return a non-zero exit code on error. Particularly, we can now return a proper error if agent template fails to render any templates due to a non-existent secret path or a non-existent key within a secret path. Using error_on_missing_key (same as the config within consul-template) on any of the templates stanza will result in agent returning immediately if a template-related error is encountered.

In order for agent to properly exit for cases such as an error due to a non-existent secret path, the current behavior of indefinite retry is changed to exit after it's max default retry of 12 attempts (with an exponential backoff of 250ms increase per try, up to 1m). This still results in a reasonable number of attempts before agent gives up and exits.

I also took the opportunity to eliminate possible panics by changing them to errors since we can now gracefully handle errors returned by the subsystems' Run function within the agent's main Run function.

Fixes #9306

…error_on_missing_key

command/agent.go

command/agent/auth/auth.go

command/agent/cache_end_to_end_test.go

command/agent/template/template.go

command/agent/template/template_test.go

command/agent.go

* agent: use oklog's run.Group to schedule subsystem runners * agent: clean up unused DoneCh, clean up agent's main Run func * agent/template: use ts.stopped.CAS to atomically swap value * fix tests * fix tests * agent/template: add timeout on TestRunServer * agent: output error via logs and return a generic error on non-zero exit * fix TestAgent_ExitAfterAuth * agent/template: do not restart ct runner on new incoming token if exit_after_auth is set to true

…annel

# Conflicts: # command/agent/cert_with_name_end_to_end_test.go # command/agent/cert_with_no_name_end_to_end_test.go

pcman312

I realize this isn't part of your PR, but I was looking to see where ctx was set in the end to end tests and came across this:

ctx, cancelFunc := context.WithCancel(context.Background())
timer := time.AfterFunc(30*time.Second, func() {
	cancelFunc()
})
defer timer.Stop()

Could you update this to:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

(also in the other end-to-end tests, not just the one I linked above)

Thanks!

pcman312 · 2020-09-28T21:20:44Z

command/agent/template/template.go

+				if ts.testingLimitRetry != 0 {
+					ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry}
+				}


I don't think you need the conditional:
https://github.com/hashicorp/consul-template/blob/7a4683109607e642f014f29898b1acc797efb890/config/retry.go#L46
https://github.com/hashicorp/consul-template/blob/master/config/vault.go#L196-L198

Suggested change

if ts.testingLimitRetry != 0 {

ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry}

}

ctv.Vault.Retry = &ctconfig.RetryConfig{Attempts: &ts.testingLimitRetry}

The default value prior to this change is based off DefaultRetryAttempts, which is 12 retries and set by Finalize. We don't want to change this to be unlimited by default. Setting this to 0 re-introduces the behavior that we are trying to avoid since we want to give up and exit 1 after the default retry of 12 attempts.

jasonodonnell

LGTM and works as described in K8s.

…cel()

agent: return a non-zero exit code on error

e4d2730

calvn added ecosystem agent labels Aug 5, 2020

calvn added this to the 1.5.1 milestone Aug 5, 2020

calvn requested a review from catsby August 5, 2020 21:44

calvn added 2 commits August 5, 2020 17:36

agent/template: always return on template server error, add case for …

1f3dfbf

…error_on_missing_key

agent: fix tests by updating Run params to use an errCh

16193fa

calvn marked this pull request as ready for review August 6, 2020 19:12

calvn requested a review from a team August 7, 2020 22:06

agent/template: add permission denied test case, clean up test var

7be6960

pcman312 reviewed Aug 12, 2020

View reviewed changes

agent: use unbuffered errCh, emit fatal errors directly to the UI output

ef1b835

ncabatoff reviewed Aug 18, 2020

View reviewed changes

command/agent.go Outdated Show resolved Hide resolved

calvn mentioned this pull request Aug 18, 2020

agent: use oklog's run.Group to schedule subsystem runners #9761

Merged

calvn requested a review from briankassouf August 18, 2020 22:01

agent: drain ah.OutputCh after sink exits to avoid blocking on the ch…

1c70e8f

…annel

calvn modified the milestones: 1.5.1, 1.5.2, 1.5.4 Aug 19, 2020

Merge remote-tracking branch 'origin/master' into agent-err-exit-code

460bf36

# Conflicts: # command/agent/cert_with_name_end_to_end_test.go # command/agent/cert_with_no_name_end_to_end_test.go

kalafut modified the milestones: 1.5.4, 1.6 Sep 15, 2020

calvn requested review from pcman312 and jasonodonnell September 28, 2020 21:01

pcman312 reviewed Sep 28, 2020

View reviewed changes

jasonodonnell approved these changes Sep 29, 2020

View reviewed changes

calvn added 2 commits September 29, 2020 15:51

Merge remote-tracking branch 'origin/master' into agent-err-exit-code

c5b4ff8

use context.WithTimeout, expand comments around ordering of defer can…

fac5f23

…cel()

pcman312 approved these changes Sep 29, 2020

View reviewed changes

calvn merged commit d54164f into master Sep 30, 2020

calvn deleted the agent-err-exit-code branch September 30, 2020 01:03

stgrace mentioned this pull request Oct 12, 2020

vault agent error_on_missing_key not working #10130

Closed

ibspoof mentioned this pull request Jan 4, 2021

Agent: Allow setting vault retry params for rendering templates #10646

Closed

This was referenced May 27, 2021

agent: allow templating to indefinitely retry if explicitly set to zero #11711

Closed

error_on_missing_key no longer working in vault 1.7 #11349

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent: return a non-zero exit code on error #9670

agent: return a non-zero exit code on error #9670

calvn commented Aug 5, 2020 •

edited

Loading

pcman312 left a comment

pcman312 Sep 28, 2020

calvn Sep 28, 2020 •

edited

Loading

jasonodonnell left a comment

agent: return a non-zero exit code on error #9670

agent: return a non-zero exit code on error #9670

Conversation

calvn commented Aug 5, 2020 • edited Loading

pcman312 left a comment

Choose a reason for hiding this comment

pcman312 Sep 28, 2020

Choose a reason for hiding this comment

calvn Sep 28, 2020 • edited Loading

Choose a reason for hiding this comment

jasonodonnell left a comment

Choose a reason for hiding this comment

calvn commented Aug 5, 2020 •

edited

Loading

calvn Sep 28, 2020 •

edited

Loading