Add "Reconciled" condition for RouteGatewayStatus #642

robscott · 2021-04-30T00:40:26Z

What would you like to be added:
Currently RouteGatewayStatus only has 1 possible condition defined in the API: "Admitted". It's not clear how that should be handled in the following use cases:

Route has been processed by Gateway controller for the specified Gateway but reconciliation has temporarily failed
Route is already being served by a Gateway but an incremental update failed

It seems wrong to set Admitted to false in this case, but it also feels like we need to have a way to communicate this in status.

One potential option would be to add a Reconciled condition that could be used to indicate if the most recent reconciliation cycle for the Route was successful.

Why is this needed:
To improve user experience.

The text was updated successfully, but these errors were encountered:

youngnick · 2021-05-11T06:24:16Z

I agree that we need to do something with Route status, but I think that porting some of the Listener conditions to the Route would probably work better. (Here, the example I'm thinking of is DegradedRoutes on the listener, that seems like it should translate cleanly into a Degraded condition on the Route.)

I'd also argue that while Admitted is a good shortcut for "this object is valid", there's probably some space for a Ready-like rollup condition that would serve as the indicator both that the config is accepted and the Route should be passing traffic. All other Conditions should be abnormal-true, error conditions (like Degraded), where they should only be status: True if there is a problem.

jpeach · 2021-05-11T06:26:20Z

Here, the example I'm thinking of is DegradedRoutes on the listener, that seems like it should translate cleanly into a Degraded condition on the Route.

A route can be degraded in the context of one gateway and perfectly OK in the context of another (really, same for reconciled).

youngnick · 2021-05-11T06:52:16Z

The status on the routes is in the context of a Gateway already though. That is, each Condition set also includes a backlink to the relevant Gateway. So bringing the Degraded condition down to the Route just means one less lookup for the user, and one more status update for the controller.

jpeach · 2021-05-11T06:58:05Z

The status on the routes is in the context of a Gateway already though.

Oh, I forgot 🤕

robscott · 2021-05-18T17:01:03Z

there's probably some space for a Ready-like rollup condition that would serve as the indicator both that the config is accepted and the Route should be passing traffic

This may be unique to our implementation, but we have a slightly different distinction. We may still be serving traffic with the older config if new config can not be reconciled. So in our case, some kind of distinction between "this config is valid" and the latest config has actually been reconciled is important. That is also fairly similar in concept to "Ready", but slightly different. Maybe both are valuable?

youngnick · 2021-05-19T06:54:32Z

Yes, having two Conditions makes sense to me, being Config is valid and Config is being served. It seems to me like they should probably be Admitted and Ready respectively.

I note that Contour currently can't guarantee Ready, sadly, because there's no feedback loop from Envoy. We can guess with reasonable confidence, but can't guarantee. No reason not to go ahead with this distinction though.

@jpeach, thoughts?

jpeach · 2021-05-19T12:17:21Z

Ready means it is working at some version. Actually specifying what the version is for an arbitrary tree of resources that can all change at any time doesn't seem straightforward to me. Hard to understand what the semantics of the valid config would be.

Route has been processed by Gateway controller for the specified Gateway but reconciliation has temporarily failed

Depending on the failure mode, I'd suggest a Gateway condition here.

Route is already being served by a Gateway but an incremental update failed

Could be a ResolvedRefs/DegradedRoutes condition on the Gateway.

youngnick · 2021-05-20T01:50:56Z

Well, a Condition does have a version, the observedGeneration, which should definitely be being checked by things consuming them.

I must admit to being a bit confused about your reluctance here @jpeach. Is it that you think that the Route should have a minimal set of Conditions, and we should teach people to look at owning Gateways for the system's current state? Personally, I don't feel like that's a great user experience for Route owners.

jpeach · 2021-05-21T00:15:37Z

I must admit to being a bit confused about your reluctance here @jpeach. Is it that you think that the Route should have a minimal set of Conditions, and we should teach people to look at owning Gateways for the system's current state? Personally, I don't feel like that's a great user experience for Route owners.

So the route is part of an object graph. For the proxy configuration to be "current", all the configuration in all the resources in the graph needs to be current on the server. When you look at a "reconciled" condition on the route, is it really telling you that the config is up to date, or only that this specific route hasn't changed since you last looked at it?

For the conditions that Rob listed at the start of this issue, explicit error conditions seems to me to be preferable to a general, "are we current" one.

youngnick · 2021-05-21T00:25:22Z

Okay, so I'm going to try and summarise what I understand you're saying.

You're in support of having the already-existing Admitted condition, plus additional ones to cover specific errors, but not a Reconciled or Ready or similar condition.

In this case, Admitted means, essentially, "no syntactic or other simple errors found, this is okay to be reconciled, that should happen at some point`.

Finding out when it has happened requires looking at the Listener.

Specific errors on the Route could point out equivalent information to Listener conditions like DegradedRoute, so a Degraded condition for example.

Adding Reconciled doesn't make sense because the condition may change with no change to the underlying object?

Just want to make sure I've got what you're saying before I think further about where I'm at. :)

jpeach · 2021-07-01T22:30:29Z

In this case, Admitted means, essentially, "no syntactic or other simple errors found, this is okay to be reconciled, that should happen at some point`.

I understood it as that, in the sense of "there is an ingress controller that accepts responsibility for this object". Once that happens, then of course the controller will reconcile it.

robscott · 2021-07-01T22:43:16Z

then of course the controller will reconcile it

Although this is likely true, in some cases it can take a matter of minutes to spin up the needed resources to reconcile. I realize that's not worth making a distinction for for many implementations though, so maybe "Reconciled" is a concept that only makes sense for some.

Adding Reconciled doesn't make sense because the condition may change with no change to the underlying object?

Yeah this is definitely a concern but I think it could be used reasonably effectively when combined with the LastObservedGeneration field though.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1419

jpeach · 2021-07-04T21:59:56Z

Although this is likely true, in some cases it can take a matter of minutes to spin up the needed resources to reconcile.

This is getting close to the ListenerReasonPending condition. As describe here, the use of "Reconciled" sounds a lot like "Ready" to me.

stevesloka · 2021-07-05T16:20:49Z

I don't want to diverge the conversation but think it's relevant. How does the already "Scheduled" condition fit into this? @youngnick & I had a conversation about this last week and I don't fully understand what "Scheduled" means in the context of a Gateway.

A gateway starts out with Scheduled: false, Waiting for controller. To me, that's Admitted: false. Once a controller takes ownership of the Gateway, then it's Admitted: true than waiting for the Reconciled status. I don't see how Scheduled: false fits. It was explained to me similar to a pod gets scheduled to a node, but we don't schedule gateways to different controllers, there's one that should pick it up and it's not managed by a "scheduler" per say.

If we step back, what is it that we want to convey to users without trying to attached names just yet. From the conversation it sounds like folks want to convey:

A controller has seen the Gateway, is responsible, and will eventually serve traffic for it, but it's doing work to get "ready"
The gateway is ready to serve some sort of traffic

That leads me to Admitted & Reconciled as @robscott proposed leaving the more specific errors to the Listeners which will come and go as they arrive & are fixed.

jpeach · 2021-07-05T21:35:09Z

Gateways are logical configurations. There might not be enough underlying load balancing capacity to configure in the requested configuration. In that case "Admitted" is true (the controller admits that it ought to be reconciling the Gateway) but "Scheduled" is false (there's no underlying capacity it can use to stand up the Gateway).

It was explained to me similar to a pod gets scheduled to a node, but we don't schedule gateways to different controllers, there's one that should pick it up and it's not managed by a "scheduler" per say.

Pods are scheduled to kubelets, and gateways are scheduled to whatever the underlying proxy infrastructure unit is. For example, a controller could spin up a VM instance for each gateway but find that its out of VM quota. It would not be able to schedule new gateways, though it could admit them.

stevesloka · 2021-07-05T22:33:21Z

Your example @jpeach sounds like Provisioned fits better since you're still not scheduling resources, you're requesting them. If you hit a VM quota, or out of resources for a cloud load balancer, or permissions error etc, then Provisioned: false tells me that something tried to spin up, but failed since it's not provisioned.

In my head, the "Scheduled" condition doesn't fit my mental model, but curious what others think.

jpeach · 2021-07-05T22:47:04Z

Your example @jpeach sounds like Provisioned fits better since you're still not scheduling resources, you're requesting them. If you hit a VM quota, or out of resources for a cloud load balancer, or permissions error etc, then Provisioned: false tells me that something tried to spin up, but failed since it's not provisioned.

That's just an example. There could be any number of reasons and the spec doesn't tell you how to do it.

youngnick · 2021-07-06T01:37:19Z

I think the key part here in this specific example is that Scheduled and Provisioned are the same thing from different sides. Let me explain.

A Gateway is a thing that sets at a logical edge of the cluster in some way. What does that mean? The spec specifically does not define exactly what the "edge" of the cluster is. We don't do that so that the spec can be applicable to lots of different use cases.

In the fully general case, on the happy path, the Gateway, per the spec passes through the following Conditions;

Admitted. The controller recognizes the object and accepts responsibility for reconciling it. This makes no statement about anything else.
Scheduled. The controller has looked at the object, determined it's valid, and checked that there are resources to provision it. It sends a request to provision it, and flips this condition to true.
Ready. The controller has determined that all the infrastructure is provisioned and working correctly, and the object is ready to forward traffic.

Right now, the Ready condition includes the Provisioned condition @stevesloka mentioned. Steve, it seems like you don't see the value in the Scheduled condition vs having a Provisioned condition before Ready?

k8s-triage-robot · 2021-10-04T02:07:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-11-03T02:34:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-12-03T03:03:18Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-12-03T03:03:29Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

robscott added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 30, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 3, 2021

k8s-ci-robot closed this as completed Dec 3, 2021

robscott mentioned this issue Aug 26, 2022

Indicate a new Route "Readiness" condition #1156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "Reconciled" condition for RouteGatewayStatus #642

Add "Reconciled" condition for RouteGatewayStatus #642

robscott commented Apr 30, 2021

youngnick commented May 11, 2021

jpeach commented May 11, 2021

youngnick commented May 11, 2021 •

edited

Loading

jpeach commented May 11, 2021

robscott commented May 18, 2021

youngnick commented May 19, 2021 •

edited

Loading

jpeach commented May 19, 2021 •

edited

Loading

youngnick commented May 20, 2021

jpeach commented May 21, 2021

youngnick commented May 21, 2021

jpeach commented Jul 1, 2021

robscott commented Jul 1, 2021 •

edited

Loading

jpeach commented Jul 4, 2021

stevesloka commented Jul 5, 2021

jpeach commented Jul 5, 2021

stevesloka commented Jul 5, 2021

jpeach commented Jul 5, 2021

youngnick commented Jul 6, 2021

k8s-triage-robot commented Oct 4, 2021

k8s-triage-robot commented Nov 3, 2021

k8s-triage-robot commented Dec 3, 2021

k8s-ci-robot commented Dec 3, 2021

Add "Reconciled" condition for RouteGatewayStatus #642

Add "Reconciled" condition for RouteGatewayStatus #642

Comments

robscott commented Apr 30, 2021

youngnick commented May 11, 2021

jpeach commented May 11, 2021

youngnick commented May 11, 2021 • edited Loading

jpeach commented May 11, 2021

robscott commented May 18, 2021

youngnick commented May 19, 2021 • edited Loading

jpeach commented May 19, 2021 • edited Loading

youngnick commented May 20, 2021

jpeach commented May 21, 2021

youngnick commented May 21, 2021

jpeach commented Jul 1, 2021

robscott commented Jul 1, 2021 • edited Loading

jpeach commented Jul 4, 2021

stevesloka commented Jul 5, 2021

jpeach commented Jul 5, 2021

stevesloka commented Jul 5, 2021

jpeach commented Jul 5, 2021

youngnick commented Jul 6, 2021

k8s-triage-robot commented Oct 4, 2021

k8s-triage-robot commented Nov 3, 2021

k8s-triage-robot commented Dec 3, 2021

k8s-ci-robot commented Dec 3, 2021

youngnick commented May 11, 2021 •

edited

Loading

youngnick commented May 19, 2021 •

edited

Loading

jpeach commented May 19, 2021 •

edited

Loading

robscott commented Jul 1, 2021 •

edited

Loading