-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job end in Failed without error when it had large logs #1675
Comments
This sounds similar to an issue I had in AWX where the job output would be truncated and the logs would end with an "error" even though there was no ending output and it seemed like the job did everything it needed to do successfully. We solved it for my case by increasing the container log size parameter. We are on RKE and this was a configuration we had to add to a YAML file on the OS. ansible/awx#10366 (comment) I think this link describes it, we just set container_log_max_size_mb to like 500 megs. k8s does garbage cleanup of the containers afterward, so you don't need to be too concerned with those big logs staying out there a while, but of course keep an eye out to be safe. I'm not as familiar with Tanzu so I don't know how you'd configure an equivalent in there, but it's my understanding that this code fix pushed a while ago to this component of the AWX econsystem fixes the issue on the AWX side without kubernetes/container log size param changes: ansible/receptor#683 I believe you need to set RECEPTOR_KUBE_SUPPORT_RECONNECT as an environment variable in the AWX operator CRD as noted here #1203 (comment) provided you're on a version of k8s and awx-operator that honor this parameter and the accompanying code changes. Haven't had the ability to test this out myself but the folks working on this project are way smarter than I am, so probably works. Hope this is helpful, sorry if I misstated anything. |
Hi @mcen1 ! Thank you very much for the info. My error is like the one you describe. I tried to set RECEPTOR_KUBE_SUPPORT_RECONNECT but didn't work and I have bad news for me, the product of Tanzu I have can't config containerLogMaxSize. Do you think I can config a log aggregator and fix the issue? Edit:
Now I need to continue on Tanzu, the idea is to work with that tool |
I can't say for sure whether using a log aggregator will be sent all the logs, I'm not really clear on the internals of how it all works. I can say that setting up a log aggregator won't solve the issue of being unable to see the full job output inside AWX itself. I have no idea how Tanzu works, but a google search brings up this: https://kb.vmware.com/s/article/87107 and it resembles what you might have to do. Maybe someone with more experience than me can say if anything more needs to be done with that RECEPTOR_KUBE_SUPPORT_RECONNECT environment variable, because it was my understanding that's supposed to be the "right" way to fix this issue. |
Thank you very much for the help @mcen1 ! I will talk with the owner of Tanzu and see if we can fix it. |
Please confirm the following
Bug Summary
Hi!
We are having an issue with jobs with large logs suddenly ending without error.
I check the logs on K8s but they are normal until they lose the connection because of the end of the job. I also look if there is a configuration about the extension of the logs or something like that but I do not find it.
With one of the projects, we did a workaround running the long playbook inside a small one. Maybe is a small problem of configuration or something I'm not seeing, but I didn't find the way to make it work.
AWX Operator version
1.0.0
AWX version
21.8.0
Kubernetes platform
kubernetes
Kubernetes/Platform version
Tanzu Cluster v1.23.8+vmware.3
Modifications
no
Steps to reproduce
Any playbook that has a very long long, around 15k lines
Expected results
the job ends no matter the log large
Actual results
the job ends without any error and with the log incomplete
Additional information
No response
Operator Logs
No response
The text was updated successfully, but these errors were encountered: