-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when changing storage size for Postgres error in playbook causes loop #1642
Comments
@tylergmuir I am confused about your description above. If we changed the operator so that it deleted the PostgreSQL StatefulSet if the storage size was change, the PVC that the data is stored in would not be deleted. So when the new StatefulSet was created, it would not enter the running state because the existing PVC would have the same name as the one the new StatefulSet would be dynamically trying to create. So I think the StatefulSet would try to use the existing PVC, and would try to change resources.requests.storage on the PVC, which is only allowed in the StorageClass specified supports and has specified The problem is that not all users will have StorageClasses that support dynamic expansion. So, if I am following correctly, we could potentially add logic to support PVC expansion for the db pvc by doing the following:
What do you think @tylergmuir ? Can you think of any other considerations? Does what I said above make sense/align with what you've seen experimentally? Also, if you or anyone else has a good idea of how this could work and has time, a PR would be welcome. |
@rooftopcellist I believe you have it all right. In my case, I had a PVC that used a storage class that did support being expanded. So all I had to do to get back to a working state was delete the StatefulSet and the rest of the existing code in the Operator handled resizing the PVC, creating the StatefulSet using that resized PVC, and building the pods on top of that. The main issue that I ran into was that by changing the But like you mentioned, in the case the user had a storage class that wasn't resizable, we would need some way of nicely stopping the process from starting to protect the service from being taken down to wait for an resize of the PVC that won't ever happen. |
Great description @tylergmuir, I had the same issue of my postgres statefulset not spinning up and the same symptoms with I was worried about deleting the statefulset and losing the PVC, but since theres no explicit retention policy defined, the PVC remained up after deleting my statefulset, and then a new statefulset spun up. Thank you! |
Please confirm the following
Bug Summary
If you change the value in
postgres_storage_requirements
it caused an error to occur. This is because the Statefulset isn't able to change that value. The taskCreate Database if no database is specified
indatabase_configuration.yml
fails. This drops it down to the rescue which scales down everything to 0.Then on the task
Remove PostgreSQL statefulset for upgrade
(which in this case, should be run) it fails to evaluate to thewhen
statement becausecreate_statefulset_result.error
does not exist. But in this case, removing the Statefulset is what is required.AWX Operator version
2.7.2
AWX version
23.4.0
Kubernetes platform
kubernetes
Kubernetes/Platform version
1.26.7
Modifications
no
Steps to reproduce
Have a functioning AWX environment using a managed Postgres pod.
Change the kustomization for the AWX environment to change the value of
postgres_storage_requirements
. This can be done by either adding it where it wasn't previously used and setting the values to something other than the default, or by increasing the current allocation.Expected results
The statefulset should be deleted and recreated with the new PVC size as defined.
Actual results
Playbook fails causing the AWX environment to be scaled to 0 for all pods and then getting stuck in a loop attempted to update the statefulset.
Additional information
Once you are stuck in this state, you can manually delete the statefulset and then allow the operator to see the statefulset is missing and have it re-create it. After that is done, the deployment continues and the environment is brought back up.
Operator Logs
The conditional check 'create_statefulset_result.error == 422' failed. The error was: error while evaluating conditional (create_statefulset_result.error == 422): 'dict object' has no attribute 'error'. 'dict object' has no attribute 'error'.
The error appears to be in '/opt/ansible/roles/installer/tasks/database_configuration.yml': line 175, column 7, but may be elsewhere in the file depending on the exact syntax problem.
The text was updated successfully, but these errors were encountered: