-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host Startup Partial Mode (#2497) #1647
Conversation
@@ -109,7 +112,13 @@ internal class FunctionIndexer | |||
} | |||
catch (FunctionIndexingException fex) | |||
{ | |||
if (_allowPartialHostStartup) | |||
{ | |||
fex.Handled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this mode is set, we'll now be defaulting these to handled. In Functions currently we're setting this manually (code here). We'll be able to remove that in favor of setting the new host config property.
027bbe8
to
215dea4
Compare
using System.Threading; | ||
using System.Threading.Tasks; | ||
|
||
namespace Microsoft.Azure.WebJobs.Host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These utility methods and tests for exponential backoff are copied from an Azure Functions internal helper I also wrote.
|
||
var validators = new Action<string>[] | ||
{ | ||
p => Assert.Equal(p, "The listener for function 'testfunc' was unable to start."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the logs users will see while this is going on in the background
/// - Functions listener startup failures will be retried in the background | ||
/// until they start. | ||
/// </remarks> | ||
public bool AllowPartialHostStartup { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New public surface area. Still considering the right name for this
// if to here, the listener exception was handled and | ||
// we're in partial startup mode. | ||
// we spin up a background task retrying to start the listener | ||
Task taskIgnore = Task.Run(() => RetryStartWithBackoffAsync(cancellationToken), cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what does the additional Task.Run(..) achieve here? How is it different from just
`Task taskIgnore = RetryStartWithBackoffAsync(cancellationToken);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that invoking directly (fire and forget) will capture the current synchronization context and resume on that context. Task.Run will just start and complete on a thread pool thread without capturing/restoring a context.
string message = $"Retrying to start listener for function '{_descriptor.ShortName}' (Attempt {attempt})"; | ||
_trace.Info(message); | ||
_logger?.LogInformation(message); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, is this going to be really noisy in the case of misconfigured functions where we never successfully start the listener? If so, can we use some sort of category here to make it easy to filter the noise out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "misconfigured"? Many issues will be indexing time failures (e.g. app setting for Connection doesn't exist, and in many cases even an invalid connection string for bindings like queue that do up front validation). If the function is truly broken, the backoff will max out to once every 2 minutes, which won't be very noisy. But yes, there will be an initial flurry of retries.
I'm open to putting them in a category
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we don't use a separate category for these now and wait to see if it's actually a problem. Always easy to add later. I started creating a new logger instance for these under a different category and it seemed unnecessarily artificial.
// swallow and log exceptions since we're in a retry loop | ||
// at this point doing this in the background | ||
_trace.Error(ex.Message, ex); | ||
_logger?.LogError(0, ex, ex.Message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, wondering about whether this will generate lots of log noise..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noise for us you mean? Or for customers - I don't think it is noise for customers. They should be seeing these.
Its a bit hard for me to see from the diff, is there any public surface area change here? It looks like the flag is on internal types but maybe I'm misunderstanding.. |
The new public surface area is the new JobHostConfiguration property which is public and is a new feature that people can use. |
215dea4
to
0656285
Compare
Ahh yes that makes sense, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was very timely: Azure/azure-webjobs-sdk-extensions#398
/// <param name="min">The minimum delay.</param> | ||
/// <param name="max">The maximum delay.</param> | ||
/// <returns>A <see cref="Task"/> representing the computed backoff interval.</returns> | ||
public static async Task DelayWithBackoffAsync(int exponent, CancellationToken cancellationToken, TimeSpan? unit = null, TimeSpan? min = null, TimeSpan? max = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a RandomizedExponentialBackoffStrategy
-- can you use that existing logic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll move to that and remove the new code I added for this. I had considered that previously but thought the core backoff stuff was tied too deeply with the TaskSeriesTimer stuff (I was wrong).
0656285
to
e7f69bb
Compare
e7f69bb
to
957b974
Compare
Core changes required to address Functions issue Azure/azure-functions-host#2497. In Functions, we'll set this new mode to true. The default behavior for Functions will be the same as we have now in that we allow the host to start partially - however we'll no longer ignore listener startup failures. Listeners will now be retried in the background.
While this feature is being added primarily for Azure Functions usage, it can be generally useful. E.g. in a continuous WebJob with many different functions, the back end service for one listener might be down for a bit preventing the listener from starting. However all the other functions can run fine. With this feature, rather than the host not being able to start at all until ALL listeners can start (i.e. the standard continuous WebJobs restart loop), most functions can start running immediately while we wait for the one to recover.