-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible race condition with Kustomize, cron jobs and config maps #18837
Comments
@nickjj Are you using a Pre or Post sync hook for the CronJob or is it part of the normal Sync? |
It's part of a normal sync. |
It may be worth pointing out the config map has a sync-wave value of -30. I wanted to make sure the config map gets updated before everything else since other resources (jobs, deployments, etc.) depend on it. After that SealedSecrets has -20 and everything else has the default value. |
How is it possible for the CronJob to still reference |
I'm not sure how it's possible. Maybe it has to do with how it's scheduled? We have around 12 cron jobs and this one cron job is the only one where this happens but it's configured the same way as the others. The only difference is this cron job happens to run for ~15 minutes and runs every 30 minutes. The time gap is quite small. It usually takes 1-3 seconds to progress to the point where new pods based on deployments and jobs are coming up. |
How does the CronJob reference the ConfigMap? |
In the same way as the deployment and other jobs: envFrom:
- configMapRef:
name: "hello-world-app" In the configMapGenerator:
- behavior: "merge"
envs:
- ".env"
name: "hello-world-app" In this case the |
So Kustomize is responsible for making sure the generated names match. I think we need a clearer picture of the exact scenario that's producing the mismatch between the Job spec and the existing ConfigMap. Argo CD's responsibility ends at applying the specified manifests. The conditions which produce the mismatch are what we need to understand and resolve. |
Beyond what's in the bullet points of the workflow causing this, is there anything I can add to this issue? |
Here's a few screenshots in order of when it happened recently. Sorry for the small font size on some of them. Let's just say Outlook and OneDrive are bad and resized them when I emailed them from work to my personal machine. Fortunately the text is still readable (barely) if you zoom in. But this demonstrates how the The new deployment has a new config map suffix starting with The only things I've partially redacted are names of certain resources since it would show my employer's company name and some of the things we're doing based on the cron job name. ConfigMap: newPod: config map errorPod: summary errorPod: events errorCronJob: manifestJob: manifestJob: stuck progessing due to failing podPod: manifest |
The system as designed is stateful. The ConfigMap, because of the randomized name, is effectively ephemeral. So the Job (which may persist after the ephemeral ConfigMap is deleted) depends on state which may change. You'll need to either make the Job not depend on the ephemeral resource, or build something to clean up the problematic Jobs. Is it actually necessary to use a generated name for the ConfigMap? Could the name instead be static? |
The job requires the config map because it spawns a pod with environment variables that are necessary for the workload to be completed. The generated config map name was helpful for letting Kubernetes know its values were changed. For example if we changed an environment variable and pushed that, Argo CD picks up the change and any new resources that reference it automatically got re-deployed. If we use a static name, from what I remember that re-deployment doesn't occur automatically because Kubernetes doesn't know it changed. Any suggestions on an event based automated way to auto-delete failing jobs? |
#13642 was a suggestion to allow for static configs and have auto-rollouts but it got closed with a recommendation to use a third party tool. This feels like a pretty common use case to have a cron job that references a config map. Any possibility of this issue being investigated again given some of the other suggestions in that thread say to use Kustomize's config map generator which doesn't work which is demonstrated here? I actually use the checksum approach for secrets because we have a custom CLI tool to interact with the secrets which updates the checksum values as needed but the ConfigMap story is much different because it's just a plain text file on disk that exists in the repo. Ideally you'd want to update that file, commit and push it. Having a developer have to go in and manually calculate and then update a dozen checksum annotations wouldn't be a good outcome IMO. |
Do you need already-running Jobs to pick up the new config, or just new Jobs spawned by the CronJob after syncing? If the latter, you should be able to use a static ConfigMap name. New Jobs will pick up the newly-synced ConfigMap as soon as they're spawned. Handling config updates for Deployments as discussed in #13642 is a significantly different problem. |
The same config map used by the cron jobs is used by our main deployment which is a web app. In our case it's kind of related. I suppose the outcome I'm looking for is it to work in a way where it doesn't get itself into a failure state. If a cron job triggered a job which triggered a pod and it's currently running, I'm not sure why deploying a new version of our app will cause the pod to fail with the config map error when it's already running with the old one. In theory shouldn't the pod that's already running have everything it needs to continue running even if the old config map got pruned by Argo CD? |
I'd recommend using something like Reloader for the Deployment. Or you could split the config into two ConfigMaps: one with name randomization to trigger the Deployment restart, and the other without name randomization for the CronJob to use. Right now, you have two resources with very different runtime behavior depending on the same ConfigMap. In one case, an ephemeral ConfigMap is fine (and even preferable if you want a quick-and-easy restart mechanism), and in the other case, an ephemeral ConfigMap is unacceptable because stateful resources (Jobs) depend on the ConfigMap persisting.
Because the Job doesn't load the ConfigMap, the Pod does. If the Job needs to retry for some reason, it will spawn a new Pod, and the Pod will not be able to load the now-deleted ephemeral ConfigMap. You'd hit the same problem if a a new Pod belonging to an old RepliaSet depended on an ephemeral ConfigMap. Imagine you deploy a new image, and the Deployment spins up a new ReplicaSet which fails with an ImagePullBackoff. If a Pod belonging to the old ReplicaSet were evicted, a new one would be spun up, and the new Pod would fail to load the now-deleted ephemeral ConfigMap. Randomized ConfigMap names is just a bad way to force workload restarts. Other tools exist because a more elegant solution is actually necessary. |
That's the interesting part. The job didn't need to retry, it was happily running before the new deployment.
Thanks. I'll check into this and only have it enabled for the deployment. |
That is weird. The Pod events don't make sense to me. Why was the image pulled so many times? And why was it so long from the initial pull attempt until the first successful pull? Was the referenced ConfigMap deleted in that ~10min window? |
It's hard to say why it was pulled that much. Maybe each time K8s re-tried, it re-pulled? Right now a job is running successfully and the pod shows 1 pull with 1 container start. This job pretty much never fails, it runs a lot to pickup new work. Initial pull vs successful pull is likely due to it being in a stuck state for a while. After I took all of the screenshots I manually deleted the job which fixed everything. I didn't manually delete the ConfigMap, Argo CD must have pruned it after the app was synced successfully. |
It's also odd that they're are more container failure events than ConfigMap error events. I wonder if there was an initial failure after the image was finally pulled and if the container restart triggered the ConfigMap reload which failed due to the ConfigMap having been pruned. It's difficult to reconstruct the events. But either way, it's dangerous to leave dangling references to an ephemeral resource. |
Is there anything at the K8s level that would trigger the job / pod to restart even if it didn't fail when a new deployment went out? |
Nothing I can think of off-hand... but glancing at the Job spec in the screenshot, I realize it's got a lot of functionality I've never used. |
I do think the 10min gap between first pull and first run is plenty of time to trigger the race condition, even without a container restart. |
Most of the options are the default. I've set We've never had this fail other than during a deployment, for like 2 years. I'm not sure what would cause the 10min gap between first pull and first run. The image being pulled is the official'ish curl image. We have other cron jobs using the same image and they finish in a few seconds end to end. |
With the 3 retries configured, it's possible the container fails sometimes, but has never exceed the 3 retry threshold. A failed run/failure/retry isn't the only thing that may introduce sufficient time to trigger the race condition. The slow pull is another. Basically anything that adds time between "reference broken by resource prune" to "failed use of dangling reference" can trigger the race condition. I'm going to go ahead and close the issue since Argo CD is behaving correctly. Eliminating the possibility of dangling references is outside the scope of Argo's purpose, which is to apply declarative config. Occasional failures are an inherent part of a declarative, eventually-consistent system. If the failures caused by dangling references to the ephemeral ConfigMap are unacceptable, then you'll need to rework how that config is referenced to make the dangling reference less likely. Happy to keep brainstorming here, just closing for bookkeeping purposes. |
Thanks. I may just configure Argo CD not to prune the config maps and have a mental note to clean them up every quarter. That would at least bypass needing to run another tool (Reloader) in our cluster and be responsible for its configuration and updates. |
Yeah, that's probably easiest. And I bet there are generic ttl controllers that can clean them up for you and avoid the memory item. |
You mean something like this, https://github.com/TwiN/k8s-ttl-controller? Looks like 1 more thing to install and configure tho. I wonder how much memory will be used if ~50-75 old config maps were sitting around. Most of them have about 10kb of data. Do you know offhand if they all sit in memory somewhere? |
Yep, exactly like that. Indeed, another thing to maintain. They'll all sit around in etcd. 50 ConfigMaps should be no big deal. The most annoying part will be your app being forever OutOfSync. |
Hmm, I didn't know it would be forever OutOfSync. It looks like there's also #1636 which is open but for almost 5 years. |
Ah that's a good idea. Not sure I love #1636, we'd have to be in the business of understanding every way in which one resource can reference another resource. Seems like a lot of work for low value, and fairly risky. |
Describe the bug
I've noticed that about 10% of our app deployments end up with an outcome where we have a cron job that gets stuck due to
"configmap not found"
with an error ofCreateContainerCconfigError
. The solution for now is to delete the job that gets spawned by Kubernetes which then kicks off a new job with the correct config map.I'm not 100% on the root cause but I believe it's related to this:
myapp-abc123
prune: true
as well asPruneLast=true
myapp-abc123
myapp-xyz123
myapp-abc123
myapp-abc123
and that doesn't exist anymore so we get the errorExpected behavior
The cron job gets updated to spawn jobs with the new config map so this error doesn't happen.
Version
I'm opening this issue on a different device from where I ran the version command, but this has been happening with many different versions since I started using Argo CD. I'm currently running Argo CD
v2.10.5
with Kustomizev5.2.1
.Logs
Beyond the error messages posted above, Kubernetes does throw
TooManyMissedTimes
error events on the cron job which makes sense since the job fails in a loop until we delete it.Workarounds
What would need to be configured or adjusted to fix this in a bullet proof way? Thanks.
The text was updated successfully, but these errors were encountered: