-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Host Startup support for transient startup failures (#2497) #2606
Conversation
@@ -1795,9 +1804,6 @@ private void HandleHostError(Exception exception) | |||
// Also notify the invoker so the error can also be written to the function | |||
// log file | |||
NotifyInvoker(functionException.MethodName, functionException); | |||
|
|||
// Mark the error as handled so execution will continue with this function disabled | |||
functionException.Handled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer need to set this - the host level knob replaces this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we already had some tests validating this, and they just continue passing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. However I just added a bit to that test to verify the background retries.
This test was a ServiceBus listener test, and it turned out that the retries exposed a bug in ServiceBusListener state management. See supporting PR https://github.com/Azure/azure-webjobs-sdk/pull/1650/files
8013267
to
33125c9
Compare
// Default AllowHostPartialStartup to true, but allow it | ||
// to be overridden by config | ||
hostConfig.AllowPartialHostStartup = true; | ||
JToken allowPartialHostStartup = (JToken)config["allowPartialHostStartup"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can now turn this feature off completely if they wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a 'full' test where we turn this off with a 'bad' binding? What happens? Does the host throw an exception and the ScriptHostManager goes into it's backoff/retry loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this off, you'll get normal SDK behavior. The host will fail to start and will enter the ScriptHostManager host start loop, as it would for any other unhandled host level error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one question about whether we need a new test. Otherwise, looks good.
@@ -1795,9 +1804,6 @@ private void HandleHostError(Exception exception) | |||
// Also notify the invoker so the error can also be written to the function | |||
// log file | |||
NotifyInvoker(functionException.MethodName, functionException); | |||
|
|||
// Mark the error as handled so execution will continue with this function disabled | |||
functionException.Handled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we already had some tests validating this, and they just continue passing?
// Default AllowHostPartialStartup to true, but allow it | ||
// to be overridden by config | ||
hostConfig.AllowPartialHostStartup = true; | ||
JToken allowPartialHostStartup = (JToken)config["allowPartialHostStartup"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a 'full' test where we turn this off with a 'bad' binding? What happens? Does the host throw an exception and the ScriptHostManager goes into it's backoff/retry loop?
33125c9
to
c7a5176
Compare
c7a5176
to
3d63fb3
Compare
Fix for #2497. See details in PR for SDK feature here: Azure/azure-webjobs-sdk#1647.
The main impact for Functions is that now listener level exceptions will be retried behind the scenes. E.g. you could see these changes in action if you modify the EventHubTrigger sample function by changing the 'path' property of the trigger binding to an invalid value. Previously we would log the error and start the host with that function disabled. Now, we'll still allow the host to start, but we'll also continue trying to start that function's listener in the background, exponentially backing off. While the host is up and retrying, if you went and added that path to your EventHub the listener would start and the function would be active.
As the issue indicates this is intended to address transient listener failures. However we will now be retrying all listener exceptions, meaning if a function is permanently broken and the failure is a listener level failure, we'll retry. Most broken functions will result in indexing failures not listener failures, so hopefully this won't generate too much background activity in general. People should be fixing (or disabling) their broken functions anyways.
Another change here is that the new host.json config setting
'allowPartialHostStartup'
can now be set by the user, so people can turn this behavior off if they wanted too.