-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: crd.projectcalico.org/v1 vs projectcalico.org/v3 #6412
Comments
Cross-referencing an older, tangential discussion: #2923 |
Why is a webhook less desirable than an apiserver? By putting the validation logic in a webhook, that would remove the need for the apiservice (assuming defaulting could be done in CRD) |
It's not that it's less desirable, per-se, it's mostly just that it suffers from many of the same problems as a separate apiserver does - i.e., running another pod on the cluster that needs its own networking, etc, in order to provide defaulting and validation, rather than performing that within the Kubernetes API server natively. |
The nice thing about a validating webhook is that it has a builtin toggle for skirting around it when one is in the early stage of installing calico (assuming the webhook ran in the pod network). However, my bet is that a webhook with
|
We'd like to extend an already mentioned pain point: Requirement for calico / networking to work, before k8s api server can serve Because the CRD is registered using a k8s service ip, this essentially also requires that routing k8s service cidr works properly on k8s controller nodes. This is in particular difficult if you are running isolated k8s controller nodes and rely on BGP & calico to announce the k8s service cidr routes to the k8s controller nodes. During our experiments we could end up in a situation where no k8s service routes where present (anymore), and k8s controller node / api server component not being able to properly setup |
It should be documented loud and clear that if API server is not installed that /v3 CRD will NOT work. Spent over half a day struggling to figure this out, as it's not noted anywhere in install documentation |
@caseydavenport I installed the latest version of calico using this helm chart. The Also after each random
These CRDs are automitcally installed using the helm chart stated above.
Do I understand it correctly that these |
Or maybe someone else that can help me with this issue? |
I'm hitting the exact same issue:
Looking into the issue I have hit there appears to be failure in the apiserver look at the logs there: I'm seeing the following error(s):
Any chance you are seeing the same errors? When I look for the configmap in kube-system:
As you can see it is there and there is content. I took a look at each of the ClusterRoles and RoleBindings as they are laid down in my cluster and it looks like the default service account calico-apiserver has been granted access to the resource above: All of the calico-apiserver clusterrolbindings point to the same subject:
That all looks fine so far and the pod is running with the same service account:
There is one secret:
I'm still trying to fumble through this but perhaps you see the same messages in the apiserver? I'm thinking that is the root cause but not quite sure how to address it yet. cheers 🍻 |
Hmm I don't have any errors in the calico-apiserver container. These are my logs in the container:
|
@lucasscheepers / @chet-tuttle-3 please raise a separate issue for that and ping me on it - this is a high-level tracking issue for discussing general strategy, not for individual diagnosis. |
@caseydavenport @chet-tuttle-3 I've created a separate issue |
How to extend api-version projectcalico.org/v3
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
---
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec: {} kubectl apply -f ./apiserver.yaml Trigger the operator to start a migration by creating an
kubectl delete ippools.projectcalico.org default-ipv4-ippool |
Please could I get some further clarification. I'm using Azure AKS and I use their turnkey enablement of Calico Network Policy. This results in a number of 'crd.projectcalico.org' CRDs being installed into my clusters which are v1. I've subsequently being deploying/using Calico [Global]NetworkPolicies and [Global]NetworkSets for quite some time. If I alter my manifests to target apiVersion /v3 then I get the error cited in this issue: "no matches for kind "X" in version "projectcalico.org/v3" when attempting to apply a resource" I was about to raise a ticket with Microsoft to question why they are deploying v1 and not v3 while pointing them to this particular GitHub issue since it states in here that v1 is not supported - obviously concerning. However before doing so, I also came across these manifests this morning which are provided for the current latest version of Calico (3.26.4): I assume these manifests are provided are provided by the Tigera/Calico developers and I note that they are too versioned as v1. I'm now confused. The guidance at the top of this issue states that I should not be using v1, but yet these manifests only make v1 available. And an AKS Cluster with Calico Network Policy enabled also deploys v1 definitions. |
@MarkTopping the v1 CRDs are installed on each cluster, but are just used internally by Calico. The reason v3 is recommended over v1 is that the v3 APIs are implemented using an extension API server (as described in this document: https://docs.tigera.io/calico/latest/operations/install-apiserver) which provides defaulting and validation as a layer on top of the crd.projectcalico.org resources. The summary is:
100% agree this is confusing, and it's why I raised this issue. The fact that |
Thanks for replying. Appreciated! To be clear then; does the recommendation of installing the Calico API Server also apply to users of Azure AKS who offload the Calico installation to Microsoft when they enable the Calico NetPol feature? I wonder could their be any implications to this? For example when Microsoft push down an Calico upgrade to customers then the API Server would not get updated at the same time and thus fall out of sync... is that safe for production clusters? If the recommendation is for all Calico OS users to deploy the API Server, then I believe this detail to be missing from both Calico and Microsoft AKS documentation :( |
There is potential here that if a new API field is introduced, the old apiserver would not be aware of it. But the risk is low so long as you're not running in that state for an extended period of time.
Really, AKS should be including the Calico API server as part of its offering and managing it as it does for other components. To be honest, I wasn't aware they weren't already. I think the "best" way in your scenario would be to use |
We need == here instead of =. As it turns out there's no validation on the apparently internal-only crd.projectcalico.org/v1 NetworkPolicy objects: projectcalico/calico#6412 Bug: T365855 Change-Id: I8908681369eb52b3c8bada3b0ec36805b50a6290
This issue comes up frequently enough that I think it warrants its own parent issue to explain and discuss. I'll try to keep this up-to-date with the latest thinking and status should it change.
The problem generally manifests itself as one of the following:
apiVersion: crd.projectcalico.org/v1
and Calico not behaving as expected.TL;DR
Don't touch
crd.projectcalico.org/v1
resources. They are not currently supported for end-users and the entire API group is only used internally within Calico. Using any API within that group means you will bypass API validation and defaulting, which is bad and can result in symptoms like # 2 above. You should useprojectcalico.org/v3
instead. Note thatprojectcalico.org/v3
requires that you install the Calico API server in your cluster, and will result in errors similar to # 1 above if the Calico API server is not running.Ok, but why do it that way?
Well, it's partly because of limitations in CRDs, and partly due to historical reasons. CRDs provide some validation out of the box on their own, but can't do some of the more complex cross-field and cross-object API validation that the Calico API server can perform. For example, making sure that IP pools are consistent with the IPAM block resources within the cluster is a complex validation process that just can't be expressed in an OpenAPI schema. Same goes for some of the defaulting operations (e.g., conditional defaulting based on other fields).
As a result, Calico uses an aggregation API server to perform these complex tasks against
projectcalico.org/v3
APIs, and stores the resulting validated and defaulted resources in the "backend" as CRDs within thecrd.projectcalico.org/v1
group. Prior to the introduction of said API server, all of that validation and defaulting had to be performed client-side viacalicoctl
, but data was still stored in the "backend" as CRDs, for Calico itself to consume.CRD validation has come a long way since Calico initially started using them way back in beta when they were actually called ThirdPartyResources. However, they still don't (and probably won't ever) support the types of validation that Calico currently enforces via its API server.
Pain points
Yes, this model is not perfect and has a few known (non-trivial) pain points that I would love to resolve.
Can we make it better?
Maybe. I hope so! But the solutions are not simple. We'd need to do at least some combination or the following, based on my current best guesses.
crd.projectcalico.org/v1
and make it supported. We can't do this without introducing a webhook, which is not really desirable. We can do maybe 25% of our validation via CRD server-side validation, but we'd be losing a lot of our current validation and defaulting if we go this route.crd.projectcalico.org/v1
altogether, instead back the aggregation layer in Kubernetes' etcd instance. This solves the "two APIs" problem, but would be a rather cumbersome data migration project and doesn't remove the need for a Calico-specific API server.policy.projectcalico.org/v3
and write all new CRD-based APIs within it, with a focus on making the syntax and semantics 100% compatible with what CRD validation and defaulting provides.The text was updated successfully, but these errors were encountered: