-
Notifications
You must be signed in to change notification settings - Fork 520
Conversation
@@ -219,210 +219,215 @@ private async Task StartContainerAsync(Application application, Service service, | |||
|
|||
async Task RunDockerContainer(IEnumerable<(int ExternalPort, int Port, int? ContainerPort, string? Protocol)> ports) | |||
{ | |||
var hasPorts = ports.Any(); | |||
while (!dockerInfo.StoppingTokenSource.IsCancellationRequested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I implement the monitor, it might decide to kill a container because it's unhealthy. In that case I'd want the DockerRunner
to recreate the replica.
Another way to do it, is to kill the container by calling docker restart
, and then there is no need for that loop. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument behind the former, is that it's consistent with how I'll do it in ProcessRunner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker runner already restarts the replica albiet with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidfowl I think that the problem with doing docker restart
is that the health monitor won't be able to tell that the container was restarted (because there won't be Stopped
and Started
events). Unless, I explicitly send ReplicaState.Stopped
and ReplicaState.Started
events whenever I restart the container, but I don't like that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell with this loop, you would call docker rm
and other commands I don't think you want to run:
tye/src/Microsoft.Tye.Hosting/DockerRunner.cs
Line 429 in 65c7ae0
result = await ProcessUtil.RunAsync("docker", $"rm {containerId}", throwOnError: false, cancellationToken: timeoutCts.Token); |
The restart loop for rerunning docker is here:
tye/src/Microsoft.Tye.Hosting/DockerRunner.cs
Line 382 in 65c7ae0
var logsRes = await ProcessUtil.RunAsync("docker", $"logs -f {containerId}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotalik yeah, makes sense. I'll try doing it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotalik changed the DockerRunner
to do docker restart
when StoppingTokenSource
is canceled.
@davidfowl there are two major things left
Maybe it's best to start reviewing this PR and I can submit the rest later on this PR or another PR? |
# Conflicts: # src/Microsoft.Tye.Hosting/DockerRunner.cs
replicas: 3 | ||
bindings: | ||
- port: 8004 | ||
liveness: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 probes are bigger than the entire yaml, that makes me 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidfowl yeah... I thought it's best to go with something flexible like in k8s, but you can get a way with just providing a path
, the rest have default values
# Conflicts: # src/Microsoft.Tye.Core/ApplicationFactory.cs
I'll probably add the k8s manifest generation tomorrow. |
@areller this change is amazing. Got my coffee ☕ and will be reviewing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start. I think there is some small cleanup tasks for code readability in the replica state management, but overall looks like the right idea.
|
||
if (builderHttp.Protocol != null) | ||
{ | ||
httpGet.Add("scheme", builderHttp.Protocol!.ToUpper()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need !
here, other similar cases in this file as well.
httpGet.Add("scheme", builderHttp.Protocol!.ToUpper()); | |
httpGet.Add("scheme", builderHttp.Protocol.ToUpper()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. I removed it and it didn't throw any warnings. probably because it's inside a block that runs if builderHttp.Protocol != null
.
It's weird because at times the compiler will throw a warning when not using the !
operator, even though the expression is in a context where it's guaranteed to not be null...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found an example: if you go to ReplicaMonitor.cs
:113 and remove the !
after serviceDesc.Readiness
, the compiler will throw a warning, even though there is no way for that variable to be null at that point.
else | ||
{ | ||
// If port is not given, we pull default port | ||
var binding = service.Bindings.First(b => builderHttp.Protocol is null || b.Protocol == builderHttp.Protocol); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel a bit awkward to have health checks per service rather than per endpoint, but it seems to be what k8s does. Or maybe we require a port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiring a port will indeed make it more compatible with k8s, but it will make the tye.yaml
more verbose and less maintainable in my opinion. one thing that k8s does to help with maintainability is to allow to specify a variable as a port. e.g.
port: liveness-port
@@ -50,7 +50,7 @@ static int GetNextPort() | |||
binding.Port = GetNextPort(); | |||
} | |||
|
|||
if (service.Description.Replicas == 1) | |||
if ((service.Description.Readiness is null || service.Description.RunInfo is IngressRunInfo) && service.Description.Replicas == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add the check for service.Description.RunInfo is IngressRunInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I forgot to remove when added service.Description.Readiness is null
:)
private void StartLivenessProbe(Probe probe, ReplicaState moveToOnSuccess = ReplicaState.Healthy) | ||
{ | ||
// currently only HTTP is available | ||
if (probe.Http is null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (probe.Http is null) | |
if (probe.Http == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the rest of the file
// currently only HTTP is available | ||
if (probe.Http is null) | ||
{ | ||
_logger.LogWarning("cannot start probing replica {name} because probe configuration is not set", _replica.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_logger.LogWarning("cannot start probing replica {name} because probe configuration is not set", _replica.Name); | |
_logger.LogWarning("Cannot start probing replica {name} because probe configuration is not set", _replica.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the rest of the file.
|
||
private void MoveToHealthy(ReplicaState from) | ||
{ | ||
_logger.LogInformation("replica {name} is moving to an healthy state", _replica.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logs should probably be at debug level.
|
||
public override void Start() | ||
{ | ||
Func<ReplicaBinding, bool> bindingClosure = (_httpProberSettings.Port.HasValue, _httpProberSettings.Protocol != null) switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably deserves a comment as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this overall looks good. There will be some follow up I'd like for us to do for polish, but can be done in a separate PR, including:
- Sample
- Docs
- Recipe
@@ -187,6 +201,165 @@ private static void HandleServiceVolumes(YamlSequenceNode yamlSequenceNode, List | |||
} | |||
} | |||
|
|||
private static void HandleServiceProbe(YamlMappingNode yamlMappingNode, ConfigProbe probe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return ConfigProbe instead of passing it in s.t you can set service.Liveness/service.Readiness directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. That would probably make more sense but it will break compatibility with the rest of the methods. What do you think?
@@ -187,6 +201,165 @@ private static void HandleServiceVolumes(YamlSequenceNode yamlSequenceNode, List | |||
} | |||
} | |||
|
|||
private static void HandleServiceProbe(YamlMappingNode yamlMappingNode, ConfigProbe probe) | |||
{ | |||
foreach (var child in yamlMappingNode.Children) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests for these parsing wise? Much faster for checking these kinds of conditions like making sure initialDelay can't be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to come up with something. You want it as part of this PR? or break it up in a different one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think following the structure here: https://github.com/dotnet/tye/blob/master/test/UnitTests/TyeDeserializationTests.cs#L25.
Mostly just parsing/ validation of tye.yaml to make sure we cover these edge cases correctly.
No preference whether we add them in this PR now or follow up, I'd just like them in general 😄
{ | ||
public class ReplicaMonitor : IApplicationProcessor | ||
{ | ||
private ILogger _logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidfowl to review this on top of me.
Co-authored-by: Justin Kotalik <jukotali@microsoft.com>
Co-authored-by: Justin Kotalik <jukotali@microsoft.com>
I'm going to go ahead and merge this, let's do some follow up on top of it 😄 . Again, thanks @areller for all the work you have done here |
@jkotalik thank you! Will create a PR in the following days |
adding health checks according to this #19
Changes
liveness
,readiness
. each can hold these settingsHealthy
,Ready
(Ready
= passed bothliveness
andreadiness
probing.Healthy
= passedliveness
but notreadiness
).If neither
liveness
norreadiness
were provided, replica will be upgraded toReady
upon creation. If onlyliveness
if provided, replica will be upgraded toReady
upon passing theliveness
probe. if onlyreadiness
is provided, replica will be upgraded toHealthy
upon creation.ReplicaMonitor
that reads theliveness
andreadiness
probe configuration, and kills/changes the state of the replica according to probe resultsReady
replicasliveness
andreadiness
configuration.