-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capsule restarts often, probably due to GlobalTenantResources #1263
Comments
@sandert-k8s thanks for the report, i will look into it. What's the pod exit status (is there a OOMKill or just exit 1)? |
Nobody expects the concurrent map writes!
The first thing I noticed from there, we need to fix the concurrency, definitely a bug. |
Thanks for the fast replies!
So, exitCode 2. |
We will use way more memory since the replication will rely on caching objects to avoid putting pressure on the API Server, it's by design. The issue is not related to memory, but as the logs shared, we had concurrent map writes with the I already opened a PR that fixes this issue, thanks for reporting! |
Bug description
The capsule controller pod has been restarted for 300 times in the last 14 days. I think it somehow suddenly fails at the step for reconciling the GlobalTenantResource, although we don't change anything in it. We experience this issue in 2 different Kubernetes clusters.
How to reproduce
Not completely sure how to reproduce, but our tenant setup quite basic (I deleted some options, but I don't see these were relevant for this bug):
And the globalTenantResources:
(The .status.processedItems were way more secrets, but I left 2 in for reference)
Expected behavior
Not crashing Capsule
Logs
I've added the logs of capsule from a pod when it crashes, with a few lines of log above it.
capsule.log
Additional context
The text was updated successfully, but these errors were encountered: