You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bpfman operator seems to be "vulnerable" to OOMKills due to watching configmaps cluster-wide.
Steps to reproduce:
install bpfman (just the operator is needed)
run the following:
kubectl create namespace testforiin {0..400};do kubectl create cm test-cm-$i -n test --from-file=(put large file here) ;done
I used a file of ~ 500KB
Monitor pod metrics: ingress bandwidth and memory
We see that the created configmaps impact the operator memory, despite them being apparently unrelated to the operator, and deployed in a separate namespace. Then the pods gets OOM-killed.
I think this is due to missing configuration in the manager cache, that leads to watching cluster-wide configmaps (with informers used under the cover without restriction of scope).
This page touch-bases on the problem: https://master.sdk.operatorframework.io/docs/best-practices/designing-lean-operators/ Here, the config controller sets up a watch on configmaps. Note that the Predicate function provided does not reduce the informers scope (it only reduce the generated reconcile requests), so despite of this predicate, all cluster configmaps are still set in cache under the hood.
My guess is that it should be OK to add this config options to the manager:
The bpfman operator seems to be "vulnerable" to OOMKills due to watching configmaps cluster-wide.
Steps to reproduce:
I used a file of ~ 500KB
Monitor pod metrics: ingress bandwidth and memory
We see that the created configmaps impact the operator memory, despite them being apparently unrelated to the operator, and deployed in a separate namespace. Then the pods gets OOM-killed.
I think this is due to missing configuration in the manager cache, that leads to watching cluster-wide configmaps (with informers used under the cover without restriction of scope).
This page touch-bases on the problem: https://master.sdk.operatorframework.io/docs/best-practices/designing-lean-operators/
Here, the config controller sets up a watch on configmaps. Note that the Predicate function provided does not reduce the informers scope (it only reduce the generated reconcile requests), so despite of this predicate, all cluster configmaps are still set in cache under the hood.
My guess is that it should be OK to add this config options to the manager:
(I haven't tested)
The text was updated successfully, but these errors were encountered: