Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add v2 endpoints controller #2883

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

zalimeni
Copy link
Member

@zalimeni zalimeni commented Aug 31, 2023

Follows pattern of iterative review in #2868, and currently targets that branch due to dependence on common code added there


This PR is part of a series of PRs that add functionality to the V2 Endpoints Controller. For the MVP task to be complete, there will be a few additional changes to resolve TODOs, debounce upsert writes if needed, and add further tests.

Changes proposed in this PR:

  • Add the V2 Endpoints Controller
  • Add enough logic in the controller to manage Consul Services
  • Endpoints controller test framework, borrowed from linked Pod Controller PR (thanks @DanStough!)

How I've tested this PR: Unit tests

How I expect reviewers to test this PR: 👁️‍🗨️👃🏻👁️‍🗨️

Checklist:

@zalimeni zalimeni force-pushed the zalimeni/net-5314-v2-endpoints-controller branch 2 times, most recently from 85db7b9 to 6404ce3 Compare August 31, 2023 21:41
//TODO(zalimeni) Do we need to check service-enable label here on service/deployments/replicasets?
// Should we expect owner is annotated, and if so, is that consistent for non-replicaset workloads?
// If so, update to register and deregister conditionally.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 🧵 discusses maybe not needing explicit exclusion for services. I'm not 100% certain whether this controller should expect any opt-outs that will impact service registration, outside of disabled k8s namespaces from config, already checked above.

Input welcome! Lacking a strong opinion, I'll probably convert this to a follow-up ticket as a placeholder for a pre-GA decision.

Comment on lines +183 to +189
// We have to fetch the owner and check its type because pod names cannot be disambiguated from pod owner names due to
// the `-` delimiter and unique ID parts also being valid name components.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to correct me if anyone knows better here. 🧵 for context here.

Comment on lines 211 to 217
//TODO(zalimeni) Read from Consul or some other state to determine whether a write is needed.
// For now, we blindly write state on each reconcile to simplify implementation.
_, err := resourceClient.Write(ctx, &pbresource.WriteRequest{Resource: serviceResource})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on this r/n is that we should just try to read the existing Service, and if we find it, check for total equality minus resource version. If there's a diff, rewrite.

If we move to an EndpointSlices/Pods-based approach in the future, we'll need to do some sort of checking for stale workload selector values, but since Endpoints gives us a full snapshot up to the 1K pod limit, we should be able to just upsert today w/o being concerned with drift.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific case you were trying to address by taking the diff and rewriting it? There was some discussion of running multiple versions of the controller concurrently in the future, and I wonder if this could create flapping with different workers fighting over the same service.

I think I naively assumed that if the write doesn't error, the resource service has persisted the service. Some future work is planned for later releases to sync the status of Consul resources back to Kubernetes resources. Do you think this would address any of your concerns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific case you were trying to address by taking the diff and rewriting it?

Mainly just avoiding needless writes - TBF I'm not fully up to speed on the resource API and state store changes, but my basic understanding is reads are cheaper, and we are theoretically capped in the 2-3K/s for writes. This seemed like an easy way to avoid needless clobbering, esp. when potentially many reconciles loops attempt to write at once w/o versions and the baked-in CAS retries compete w/ each other.

There was some discussion of running multiple versions of the controller concurrently in the future, and I wonder if this could create flapping with different workers fighting over the same service.

I think the flapping would occur regardless of pre-reads? But that's a good callout. I'd hope we'd come up w/ some one-way version tagging similar or identical to the managed-by tag if needed to avoid issues from that. (Do you have a link to that conversation by chance?)

I think I naively assumed that if the write doesn't error, the resource service has persisted the service. Some future work is planned for later releases to sync the status of Consul resources back to Kubernetes resources. Do you think this would address any of your concerns?

That's my assumption too 👍🏻 ATM I'm only thinking of debouncing, though bidirectional sync would add complexity to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was in a casual Zoom, so I don't happen to have a link to the convo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked 🧵 here to seek wisdom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary from Slack: not entirely conclusive, but we agree early optimization should be generally avoided until we get a better idea of write performance in the wild.

I think it's fair to argue that debouncing rapid identical writes falls under low-hanging fruit to reduce needless load, but is leading me to additional questions about how we'll handle state in resources that may be mutated by multiple sources/data flows (not sure how much of this there will be - could resource Meta tagging be an example?), since we also can't just blindly rewrite records in that scenario.

Regardless, this is tagged for follow-up outside this PR.

Comment on lines 252 to 258
VirtualPort: uint32(p.Port),
TargetPort: p.TargetPort.String(),
Protocol: common.GetPortProtocol(p.AppProtocol),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a spot check on this mapping JIC I'm missing some nuance - cc @ishustava in case you have input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think targetPort will work in the case it is a number, but the pod defines the service with a name. We will need to look up the corresponding port name in the pod. I'm open to changing how this works on the Pod side but not sure how you get around the name being optional and targetPort working on either.

Copy link
Member Author

@zalimeni zalimeni Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think targetPort will work in the case it is a number, but the pod defines the service with a name

(This might be wrong or obvious, feel free to correct me) It seems to me from the k8s docs that there's an inherent tradeoff that applies generally: you need to pick whether to use a name consistently across service and pods (which enables mixed pod port numbers under one name), or you need to pick a consistent pod port number and use that in the service. Either way, it seems selected pods need to adhere to the service's chosen int-or-string value for targetPort.

Our docs for Consul Service.TargetPort say target_port is the name of the workload port, but I took that to mean we defer totally to the value used by the k8s service per the typical scheme above.

It seems wrong to me to reach into the pods to infer anything beyond the workload selector, since the targetPort mapping isn't really a consul-k8s-specific limitation... But I might be missing a reason we'd want to do that.

I'm open to changing how this works on the Pod side

What did you have in mind here?

Copy link
Contributor

@DanStough DanStough Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong to me to reach into the pods to infer anything beyond the workload selector, since the targetPort mapping isn't really a consul-k8s-specific limitation

Yeah, I don't like the idea of reaching in either. The pods may be slightly different, and the names don't align, making the lookup challenging. But I think I disagree that there isn't something K8s-specific here. We generate a Consul port name per port, even if one wasn't defined in the Kube spec, and Kubernetes doesn't necessarily require you to use any name at all, so there is a mismatch of strong (consul) vs. weak (k8s) references. I think I had the opposite reaction that since the VM case is straightforward, we'd have to "fix it" in consul-k8s.

I'm open to changing how this works on the Pod side

No specific ideas here. I'm just trying not to look like I'm suggesting a singular solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started 🧵 here to figure out how we want this to work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversation was ongoing as of EOD Friday so leaving for resolution outside this PR, but I'll try to keep things linked in Slack so the chain of 🧵s isn't lost from here.

For the short-term, the plan ATM is to use Consul Service TargetPort strictly as a string name. In the case where the K8s service targetPort is a number, we'll rely on best-effort inference of the best Consul TargetPort value based on the name used by the most prevalent endpoint subset for that port (e.g. none, or a specific name that isn't being directly targeted in K8s).

@zalimeni zalimeni requested review from DanStough, ishustava and thisisnotashwin and removed request for DanStough August 31, 2023 21:42
@zalimeni zalimeni added pr/no-changelog PR does not need a corresponding .changelog entry pr/no-backport signals that a PR will not contain a backport label labels Aug 31, 2023
@zalimeni zalimeni force-pushed the zalimeni/net-5314-v2-endpoints-controller branch from 6404ce3 to 0780abf Compare August 31, 2023 21:54
Copy link
Contributor

@DanStough DanStough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get a chance to review the test case yet, but everything else looks great! I wanted to send you my open-ended comments in advance, like how targetPort works and the appProtocol.

Comment on lines 211 to 217
//TODO(zalimeni) Read from Consul or some other state to determine whether a write is needed.
// For now, we blindly write state on each reconcile to simplify implementation.
_, err := resourceClient.Write(ctx, &pbresource.WriteRequest{Resource: serviceResource})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific case you were trying to address by taking the diff and rewriting it? There was some discussion of running multiple versions of the controller concurrently in the future, and I wonder if this could create flapping with different workers fighting over the same service.

I think I naively assumed that if the write doesn't error, the resource service has persisted the service. Some future work is planned for later releases to sync the status of Consul resources back to Kubernetes resources. Do you think this would address any of your concerns?

Comment on lines +151 to +158
case "http2":
return pbcatalog.Protocol_PROTOCOL_HTTP2
case "grpc":
return pbcatalog.Protocol_PROTOCOL_GRPC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we technically need to include prefixes with some of these protocols. I was reading the API docs for appProtocol and it mentions only IANA names can be unprefixed. Only my first time seeing this so I could be mistaken.

I also think an annotation on the service or pod might be easier for customers to support. e.g. the Bitnami Postgres helmchart exposes annotations for services and certain pods but not a change of appProtocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we technically need to include prefixes with some of these protocols. I was reading the API docs for appProtocol and it mentions only IANA names can be unprefixed. Only my first time seeing this so I could be mistaken.

Yeah, I was wondering the same - we may need to iterate on this part. FWIW, Istio's solution (appProtocol and detection via port name) use unprefixed values from what I can see, and Open Service Mesh also includes non-standard tcp and gRPC. I wonder whether the intent in k8s docs was valid Service Name Syntax rather than a registered name - that seems arbitrarily limiting for new protocols, to me, when the purpose appears to be open-ended extension.

I also think an annotation on the service or pod might be easier for customers to support. e.g. the Bitnami Postgres helmchart exposes annotations for services and certain pods but not a change of appProtocol.

Agreed, it seems like annotations could help ease operator pain here rather than plumbing appProtocol through templates - though maybe better for us to enforce "standard" k8s concepts and push that concern up the tooling chain rather than accommodate it ourselves? I can start a 🧵 in the multi-port channel and see whether a follow-up makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this as a bullet in follow-up ticket so we don't lose track but probably don't need to figure out right this second.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this one to tie up the loose end: @ishustava, do you happen to have perspective on either of Dan's questions above?

My gut is to leave the unprefixed protocols in place for now since there seems to be precedent in other meshes, and wait to see if we need annotations in addition to just appProtocol. If that's not ideal, we can fix now or plan to follow up in 1.18.

Comment on lines 252 to 258
VirtualPort: uint32(p.Port),
TargetPort: p.TargetPort.String(),
Protocol: common.GetPortProtocol(p.AppProtocol),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think targetPort will work in the case it is a number, but the pod defines the service with a name. We will need to look up the corresponding port name in the pod. I'm open to changing how this works on the Pod side but not sure how you get around the name being optional and targetPort working on either.

Base automatically changed from dans/NET-5315/v2-pods-controller to main September 5, 2023 17:43
@zalimeni zalimeni force-pushed the zalimeni/net-5314-v2-endpoints-controller branch from 0780abf to 2e50816 Compare September 6, 2023 13:33
Copy link
Member Author

@zalimeni zalimeni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanStough @thisisnotashwin ok, I think this is ready for a next pass. I've added more tests around the essential CrUD bits as well as the pod fetching behavior since that seemed valuable to lock in.

Assuming no major changes are needed, I'll plan to merge this to unblock testing while I follow up on remaining TODOs and Slack 🧵s.

Comment on lines 211 to 217
//TODO(zalimeni) Read from Consul or some other state to determine whether a write is needed.
// For now, we blindly write state on each reconcile to simplify implementation.
_, err := resourceClient.Write(ctx, &pbresource.WriteRequest{Resource: serviceResource})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked 🧵 here to seek wisdom

Comment on lines 252 to 258
VirtualPort: uint32(p.Port),
TargetPort: p.TargetPort.String(),
Protocol: common.GetPortProtocol(p.AppProtocol),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started 🧵 here to figure out how we want this to work

@zalimeni zalimeni marked this pull request as ready for review September 6, 2023 13:37
Copy link
Contributor

@DanStough DanStough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on all the tradeoffs we discussed to get a version of this ready to test, LGTM.

@zalimeni zalimeni force-pushed the zalimeni/net-5314-v2-endpoints-controller branch from 2e50816 to b6daa27 Compare September 6, 2023 21:51
Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BEAUTIFUL!!!

@zalimeni zalimeni force-pushed the zalimeni/net-5314-v2-endpoints-controller branch from 395da7b to 31fa5c0 Compare September 11, 2023 13:12
@zalimeni zalimeni enabled auto-merge (squash) September 11, 2023 13:13
Implement the basic requirements of a new Endpoints controller that
registers Services via Consul's V2 API.

Further tests and TODOs will be addressed in follow-up changes.
@zalimeni zalimeni force-pushed the zalimeni/net-5314-v2-endpoints-controller branch from 31fa5c0 to 202959d Compare September 11, 2023 13:34
@zalimeni zalimeni merged commit 2f84fa0 into main Sep 11, 2023
3 checks passed
@zalimeni zalimeni deleted the zalimeni/net-5314-v2-endpoints-controller branch September 11, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-backport signals that a PR will not contain a backport label pr/no-changelog PR does not need a corresponding .changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants