-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubewarden controller potentially doing too many requests to apiserver #645
Comments
I tried to reproduce the issue on a test cluster. I deployed the latest version of Kubewarden and enabled the default policies. I didn't see any suspicious activity going on. There were regular WATCH and LIST events done by the controller, to monitor the status of the watched resources, but nothing so invasive. Aside from that, the audit-scanner feature was generating some noise, but only when the CronJob was started. How many policies do you have deployed, are all active? Is there by any chance some policy that is in pending state, maybe because of a wrong configuration or because it points to a wasm module that cannot be fetched? Is there something doing changes against the policy resources or against the PolicyServer resource? This seems like the controller is trying to reconcile a state, something goes wrong, it does a couple of retries and then sleeps for X minutes. Then the cycle starts again 🤔 |
We have 22 active policies at the moment, each on its own policy server, so 22 policy servers |
Before continuing with this I'll check if it can be something from our side too |
To be honest doing a 1 minute search with just a single policy server, it does not seem like a lot, but not sure if it can have some room for improvement |
Untitled discover search (1).csv.zip Here are 15 mins with all policy servers |
Seeing the same behavior /w ~50 policies in a single policy-server instance. All request originate from the
I now have a scenario where I can reproduce the issue; 11 policies loaded (0-10 reqs/sec.) when the 12th policy is loaded, request surge to 100+ sec. When that 12th policy is deleted, requests turn back to 0-10 per sec. The DOWNSIDE is that seems to can be caused by multiple combinations of policies. In this case, the policy triggering the issue is: apiVersion: policies.kubewarden.io/v1
kind: ClusterAdmissionPolicy
metadata:
annotations:
io.kubewarden.policy.category: PSP
io.kubewarden.policy.description: Pod Security Policy that controls usage of volumes
io.kubewarden.policy.severity: medium
io.kubewarden.policy.title: cluster-volumes policy in monitor for namespaces
meta.helm.sh/release-name: kubewarden-policies
meta.helm.sh/release-namespace: cattle-kubewarden-system
creationTimestamp: "2024-04-02T07:47:03Z"
finalizers:
- kubewarden
generation: 1
labels:
app.kubernetes.io/component: policy
app.kubernetes.io/instance: kubewarden-policies
name: cluster-volumes-monitor-namespaces
resourceVersion: "23286973"
uid: 3771d03b-59fa-4df8-9c38-611455909336
spec:
backgroundAudit: true
mode: monitor
module: example.com/kubewarden/policies/volumes-psp:v0.1.11
mutating: false
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- cattle-kubewarden-system
- cattle-system
- kube-node-lease
- kube-system
- key: kubernetes.io/metadata.name
operator: In
values:
- foo
- bar
policyServer: default
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods
settings:
allowedTypes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- secret
- projected
timeoutSeconds: 10 |
Thanks, this is definitely one of the issues we will look into during the next month |
Over the course of one hour we see a single A full audit event for one of the MODIFIED events: {
"type": "MODIFIED",
"object": {
"kind": "ClusterAdmissionPolicy",
"metadata": {
"uid": "dde481c0-9207-4916-9ede-782dba6c823e",
"annotations": {
"io.kubewarden.policy.title": "cluster-disallow-secret-in-env policy in monitor for namespaces",
"meta.helm.sh/release-name": "kubewarden-policies",
"meta.helm.sh/release-namespace": "cattle-kubewarden-system",
"io.kubewarden.policy.category": "PSP",
"io.kubewarden.policy.description": "Policy that inspects env vars and rejects a request if a secret was found",
"io.kubewarden.policy.severity": "medium"
},
"labels": {
"app.kubernetes.io/component": "policy",
"app.kubernetes.io/instance": "kubewarden-policies",
"app.kubernetes.io/managed-by": "Helm"
},
"name": "cluster-disallow-secret-in-env-monitor-namespaces"
},
"status": {
"mode": "monitor",
"policyStatus": "active",
"conditions": [
{
"status": "False",
"type": "PolicyActive",
"lastTransitionTime": "2024-05-06T11:08:51Z",
"message": "The policy webhook has not been created",
"reason": "PolicyActive"
},
{
"lastTransitionTime": "2024-05-06T11:08:51Z",
"message": "The latest replica set is not uniquely reachable",
"reason": "LatestReplicaSetIsNotUniquelyReachable",
"status": "False",
"type": "PolicyUniquelyReachable"
},
{
"message": "Configuration for this policy is up to date",
"reason": "ConfigurationVersionMatch",
"status": "True",
"type": "PolicyServerConfigurationUpToDate",
"lastTransitionTime": "2024-04-23T18:30:45Z"
}
]
},
"apiVersion": "policies.kubewarden.io/v1"
}
}
Seem something is modifying the admissionPolicies once every 10 seconds. Hope this provides some clues in finding a cause. |
I'm thinking about this information about the policy status updates. My gut feeling tells me that we have an issue in the policy reconciliation indeed. Not only because of the information shared here. But when we run the integration tests we can see a lot of errors like this:
I'm not sure if this is the only cause for the too many request sent to the API server. But it looks like to me that the controller has a bug in the policy reconciliation. Because many times when the controller tries to updates the status, the resource is not the latest version. I still do not know why that happens, maybe too many reconciliation request re enqueue? I do not know. I have not spend time on this yet. Just sharing a though here to not lose it. ;) |
Okay, let me share some thoughs I had while I was working in this issue during the past day. First of all, sorry for taking so long to give an initial feedback. Busy days with personal stuff. But let's back to business.... I think the comments in this issue include 3 possibles improvements in our controller. They are:
|
Fixed with the 1.13 release |
Is there an existing issue for this?
Current Behavior
I noticed that compared to other operators, kubewarden controller seems to send a large amount of requests to the api server, probably to support reconciliation of the policy server. Based on information from my ElasticSearch instance here are some stats:
create
requests forservices, deployments and secrets
calledpolicy-server-default
that always fail with 409 because those resources already exist. They are all spread evenly so it seems kw controller is always trying to create these 3 resources constantly. Each of these (services, deployments and secrets) have 5 log entries per second, all with 409 statusI think there should be some room from improvement by slowing down some of these cycles because, in my opinion, it should be enough to send these requests every 2-3 seconds, which would decrease dramatically the load on the api-server. Again I don't have a deep knowledge about the reason to these cadence of requests, but maybe some of these request categories can be streamlined to remove some of the load on the apiserver.
Thanks
Expected Behavior
I would exptect a lower load of requests on the apiserver
Steps To Reproduce
Analyse logs from ElasticSearch coming from the apiserver - usually called audit logs
Environment
Anything else?
Not really. Thanks
The text was updated successfully, but these errors were encountered: