Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On fresh deployments, the first (or more) reconciliation loop fails on the task "Verify the resource pod name is populated" #1784

Closed
3 tasks done
kurokobo opened this issue Mar 21, 2024 · 0 comments · Fixed by #1787

Comments

@kurokobo
Copy link
Contributor

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

On fresh deployments for AWX CR, sometimes the first or more reconciliation loop fails on the task "Verify the resource pod name is populated".

If we do nothing and wait for the next reconciliation loop, this task will succeed and the deployment will complete.

The first reconciliation:

...
--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Verify the resource pod name is populated.] ******************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:284

-------------------------------------------------------------------------------

--------------------------- Ansible Task StdOut -------------------------------

 TASK [Verify the resource pod name is populated.] ******************************** 
fatal: [localhost]: FAILED! => {
    "assertion": "awx_web_pod_name != ''",
    "changed": false,
    "evaluated_to": false,
    "msg": "Could not find the tower pod's name."
}

-------------------------------------------------------------------------------
{"level":"error","ts":"2024-03-21T00:38:10Z","logger":"logging_event_handler","msg":"","name":"awx-demo","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"runner_on_failed","job":"3324091116751141590","EventData.Task":"Verify the resource pod name is populated.","EventData.TaskArgs":"","EventData.FailedTaskPath":"/opt/ansible/roles/installer/tasks/resources_configuration.yml:284","error":"[playbook task failed]","stacktrace":"github.com/operator-framework/ansible-operator-plugins/internal/ansible/events.loggingEventHandler.Handle\n\tansible-operator-plugins/internal/ansible/events/log_events.go:111"}
...

Just wait for the later reconciliation with doing nothing:

...
--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Verify the resource pod name is populated.] ******************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:284

-------------------------------------------------------------------------------
{"level":"info","ts":"2024-03-21T00:39:09Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"awx-demo","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"7413164896366316510","EventData.Name":"installer : Migrate database to the latest schema"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Migrate database to the latest schema] ***********************
task path: /opt/ansible/roles/installer/tasks/install.yml:97

-------------------------------------------------------------------------------
{"level":"info","ts":"2024-03-21T00:39:09Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"awx-demo","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"7413164896366316510","EventData.Name":"installer : Check for pending migrations"}
...
$ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -E "^PLAY RECAP" -A 1
PLAY RECAP *********************************************************************
localhost                  : ok=91   changed=19   unreachable=0    failed=1    skipped=46   rescued=0    ignored=0   👈👈👈
--
PLAY RECAP *********************************************************************
localhost                  : ok=63   changed=0    unreachable=0    failed=1    skipped=72   rescued=0    ignored=0   👈👈👈
--
PLAY RECAP *********************************************************************
localhost                  : ok=90   changed=14   unreachable=0    failed=0    skipped=81   rescued=0    ignored=2   
--
PLAY RECAP *********************************************************************
localhost                  : ok=88   changed=0    unreachable=0    failed=0    skipped=83   rescued=0    ignored=1   
--
PLAY RECAP *********************************************************************
localhost                  : ok=88   changed=0    unreachable=0    failed=0    skipped=83   rescued=0    ignored=1 

AWX Operator version

2.13.1

AWX version

24.0.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

k3s version v1.28.7+k3s1

Modifications

no

Steps to reproduce

Deploy AWX Operator 2.13.1 and following minimal AWX CR:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  namespace: awx
  name: awx-demo
spec:
  service_type: nodeport

Expected results

The first reconliciation loop is completed successfully without any failed tasks.

Actual results

The first reconliciation loop is failed then we have to wait for the next or more loops to be completed.

Additional information

Related to #1777 (comment)

As commented by @LukWe99, the wait for the web pod is removed by #1674, but we should wait for the web pod up and running before proceeding.

Of course I understand that adding wait again can't work anymore since init cotainer that wait migrations to be completed can't be completed at this point. So we should keep wait removed, alternatlvely, adding retries to the task where finding running web pod is ideal solution:

- name: Get the new resource pod information after updating resource.
k8s_info:
kind: Pod
namespace: '{{ ansible_operator_meta.namespace }}'
label_selectors:
- "app.kubernetes.io/name={{ ansible_operator_meta.name }}-web"
- "app.kubernetes.io/managed-by={{ deployment_type }}-operator"
- "app.kubernetes.io/component={{ deployment_type }}"
field_selectors:
- status.phase=Running
register: _new_pod

Operator Logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant