Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Broker availability based on the Endpoints not Deployments. #2766

Merged

Conversation

mattmoor
Copy link
Member

  • 🧽 Update or clean up current behavior

Today the fact that the Broker lifecycle stuff propagates Deployment status for filter/ingress leaks implementation details into the API types. An example of where this leakage is bas would be if I were to run a multi-tenant broker (#2760) as something other than a Deployment.

This change corrects one small part of this leaky interface by shifting from Deployment to Endpoints as the canonical resource for establishing readiness.

This is also slightly more correct as it assesses not just whether the Deployment is ready, but whether that information has propagated to the Service / Endpoints.

/hold
/assign @vaikas @lionelvillard @grantr

Today the fact that the Broker lifecycle stuff propagates *Deployment* status for filter/ingress leaks implementation details into the API types.  An example of where this leakage is bas would be if I were to run a multi-tenant broker (knative#2760) as something other than a Deployment.

This change corrects one small part of this leaky interface by shifting from Deployment to Endpoints as the canonical resource for establishing readiness.

This is also slightly more correct as it assesses not just whether the Deployment is ready, but whether that information has propagated to the Service / Endpoints.
@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2020
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Mar 17, 2020
@knative-prow-robot knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 17, 2020
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-eventing-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/broker/broker.go 80.5% 81.2% 0.6
pkg/reconciler/broker/controller.go 87.5% 88.5% 1.0

@n3wscott
Copy link
Contributor

/lgtm
/approve

@knative-prow-robot knative-prow-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 17, 2020
@grantr
Copy link
Contributor

grantr commented Mar 17, 2020

Today the fact that the Broker lifecycle stuff propagates Deployment status for filter/ingress leaks implementation details into the API types.

Is this an API purity concern or a user concern? Can you give more details on the user impact of leaking the implementation detail? The Go type hasn't changed here, so I'm not sure which API type you're referring to.

Does the switch to using endpoint availability from deployment availability change the meaning of Broker readiness? I.e. from a user perspective, if the Broker is ready, can I make the same assumptions about its state as before? If not, is the new set of assumptions more accurate or better for the user?

@mattmoor
Copy link
Member Author

/retest

(this test seems to be flaky on testgrid)

The Go type hasn't changed here, so I'm not sure which API type you're referring to.

Many levels of API 🐢 . Specifically I'm talking about the fact that in a class-based world the API exposed for folks to manipulate Broker status is perhaps overly fitted to our current implementation. Frankly even the notion of distinct ingress and filters is somewhat leaky of implementation details.

Does the switch to using endpoint availability from deployment availability change the meaning of Broker readiness?

In practice, this should be effectively the same. In theory, this is more correct as I would interpret readiness to mean that if I POST to the .status.address I don't get 500s; in that sense, it's more important that the Endpoints have gotten the ready pods than the Deployment.

Happy to discuss more in whatever medium makes the most sense, this is just something I noticed looking over #2760 and wanted to lend a hand.

@vaikas
Copy link
Contributor

vaikas commented Mar 17, 2020

As far as the Conditions, the user should only rely on the status.Ready as whether the Broker is good to go or not,

Overall, the Broker Spec does not talk about any of these, nor should it.
https://github.com/knative/eventing/blob/master/docs/spec/broker.md

And from that spec:

Broker objects SHOULD include a Ready condition in their status.

The Broker SHOULD indicate Ready=True when its ingress is available to receive events.

While a Broker is Ready, it SHOULD be a valid Addressable and its status.address.url field SHOULD indicate the address of its ingress.

So this is definitely a more better change.

@mattmoor
Copy link
Member Author

Well, what I said about the sub-conditions applies indirectly to how it manifests in Ready too.

@grantr do you have [other] concerns you'd like to discuss?

@grantr
Copy link
Contributor

grantr commented Mar 17, 2020

in a class-based world the API exposed for folks to manipulate Broker status

Ok, IIUC you're talking about the Go API (for contributors), not the content of the object's status (for users).

it's more important that the Endpoints have gotten the ready pods than the Deployment.

I'm not super familiar with the relationship between Service and Endpoints. Does a pod ip only make it into Endpoints if it's confirmed ready via liveness/readiness?

Concretely it seems like you're saying that the sequence is normally

Deployment available, then Deployments pods show up in Endpoints

And never

Deployment pods show up in Endpoints, then Deployment available

It's this second one that would be a behavior change, because it moves the readiness earlier than before. If, after this change, the user can be more sure that a Ready Broker is available to receive requests, then 👍.

@vaikas
Copy link
Contributor

vaikas commented Mar 17, 2020

/lgtm
/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, n3wscott, vaikas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mattmoor
Copy link
Member Author

Does a pod ip only make it into Endpoints if it's confirmed ready via liveness/readiness?

There are two sections notReadyAddresses and addresses. Pods only land in addresses once they have passed their readiness probe (and optional readiness gates, if folks are using that).

Deployment pods show up in Endpoints, then Deployment available

Just to be super nit picky / pedantic the flow of readiness is:

Pods -> ReplicaSet -> Deployment
Pods -> Service/Endpoints

It is possible that an Endpoints is ready before the Deployment has reported ready, but this is fine because the Deployment isn't actually what matters.

It is possible that a Deployment is ready before the Endpoints has ready addresses, which is a potential bug with what is at HEAD today.

If, after this change, the user can be more sure that a Ready Broker is available to receive requests, then 👍.

Yeah that's exactly what I'm saying. In practice this is likely a small change, but in theory a race exists today.

@mattmoor
Copy link
Member Author

This has been out for a while, and I believe we've resolved concerns. Going to land to unblock @vaikas's PR.

/unhold

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020
@vaikas
Copy link
Contributor

vaikas commented Mar 18, 2020

/test pull-knative-eventing-integration-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants