Skip to content

Commit

Permalink
Add -ness checks and refactor migrations
Browse files Browse the repository at this point in the history
  • Loading branch information
dhageman committed Mar 2, 2024
1 parent d0827ba commit 259017a
Show file tree
Hide file tree
Showing 11 changed files with 352 additions and 77 deletions.
80 changes: 80 additions & 0 deletions config/crd/bases/awx.ansible.com_awxs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1571,6 +1571,86 @@ spec:
description: Number of task instance replicas
type: integer
format: int32
web_liveness_initial_delay:
description: Initial delay before starting liveness checks on web pod
type: integer
default: 5
format: int32
task_liveness_initial_delay:
description: Initial delay before starting liveness checks on task pod
type: integer
default: 5
format: int32
web_liveness_period:
description: Time period in seconds between each liveness check for the web pod
type: integer
default: 0
format: int32
task_liveness_period:
description: Time period in seconds between each liveness check for the task pod
type: integer
default: 0
format: int32
web_liveness_failure_threshold:
description: Number of consecutive failure events to identify failure of web pod
type: integer
default: 3
format: int32
task_liveness_failure_threshold:
description: Number of consecutive failure events to identify failure of task pod
type: integer
default: 3
format: int32
web_liveness_timeout:
description: Number of seconds to wait for a probe response from web pod
type: integer
default: 1
format: int32
task_liveness_timeout:
description: Number of seconds to wait for a probe response from task pod
type: integer
default: 1
format: int32
web_readiness_initial_delay:
description: Initial delay before starting readiness checks on web pod
type: integer
default: 20
format: int32
task_readiness_initial_delay:
description: Initial delay before starting readiness checks on task pod
type: integer
default: 20
format: int32
web_readiness_period:
description: Time period in seconds between each readiness check for the web pod
type: integer
default: 0
format: int32
task_readiness_period:
description: Time period in seconds between each readiness check for the task pod
type: integer
default: 0
format: int32
web_readiness_failure_threshold:
description: Number of consecutive failure events to identify failure of web pod
type: integer
default: 3
format: int32
task_readiness_failure_threshold:
description: Number of consecutive failure events to identify failure of task pod
type: integer
default: 3
format: int32
web_readiness_timeout:
description: Number of seconds to wait for a probe response from web pod
type: integer
default: 1
format: int32
task_readiness_timeout:
description: Number of seconds to wait for a probe response from task pod
type: integer
default: 1
format: int32
garbage_collect_secrets:
description: Whether or not to remove secrets upon instance removal
default: false
Expand Down
11 changes: 11 additions & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,17 @@ rules:
- patch
- update
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- get
- list
- create
- patch
- update
- watch
- apiGroups:
- monitoring.coreos.com
resources:
Expand Down
40 changes: 40 additions & 0 deletions docs/user-guide/advanced-configuration/container-probes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#### Container Probes
These parameters control the usage of liveness and readiness container probes for
the web and task containers.

#### Web / Task Container Liveness Check

The liveness probe queries the status of the supervisor daemon of the container. The probe will fail if it
detects one of the services in a state other than "RUNNING".

| Name | Description | Default |
| web_liveness_period | Time period in seconds between each probe check. The value of 0 disables the probe. | 0 |
| web_liveness_initial_delay | Initial delay before starting probes in seconds | 5 |
| web_liveness_failure_threshold| Number of consecutive failure events to identify failure of container | 3 |
| web_liveness_timeout | Number of seconds to wait for a probe response from container | 1 |
| task_liveness_period | Time period in seconds between each probe check. The value of 0 disables the probe. | 0 |
| task_liveness_initial_delay | Initial delay before starting probes in seconds | 5 |
| task_liveness_failure_threshold| Number of consecutive failure events to identify failure of container | 3 |
| task_liveness_timeout | Number of seconds to wait for a probe response from container | 1 |

#### Web Container Readiness Check

This is a HTTP check against the status endpoint to confirm the system is still able to respond to web requests.

| Name | Description | Default |
| -------------| ---------------------------------- | ------- |
| web_readiness_period | Time period in seconds between each probe check. The value of 0 disables the probe. | 0 |
| web_readiness_initial_delay | Initial delay before starting probes in seconds | 5 |
| web_readiness_failure_threshold| Number of consecutive failure events to identify failure of container | 3 |
| web_readiness_timeout | Number of seconds to wait for a probe response from container | 1 |

#### Task Container Readiness Check

This is a command probe using the builtin check command of the awx-manage utility.

| Name | Description | Default |
| -------------| ---------------------------------- | ------- |
| task_readiness_period | Time period in seconds between each probe check. The value of 0 disables the probe. | 0 |
| task_readiness_initial_delay | Initial delay before starting probes in seconds | 5 |
| task_readiness_failure_threshold| Number of consecutive failure events to identify failure of container | 3 |
| task_readiness_timeout | Number of seconds to wait for a probe response from container | 1 |
32 changes: 16 additions & 16 deletions roles/installer/tasks/initialize_django.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
- name: Check if there are any super users defined.
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "echo 'from django.contrib.auth.models import User;
nsu = User.objects.filter(is_superuser=True, username=\"{{ admin_user }}\").count();
Expand All @@ -16,8 +16,8 @@
- name: Create super user via Django if it doesn't exist.
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: awx-manage createsuperuser --username={{ admin_user | quote }} --email={{ admin_email | quote }} --noinput
register: result
changed_when: "'That username is already taken' not in result.stderr"
Expand All @@ -28,8 +28,8 @@
- name: Update Django super user password
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: awx-manage update_password --username='{{ admin_user }}' --password='{{ admin_password }}'
register: result
changed_when: "'Password updated' in result.stdout"
Expand All @@ -39,8 +39,8 @@
- name: Check if legacy queue is present
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage list_instances | grep '^\[tower capacity=[0-9]*\]'"
register: legacy_queue
Expand All @@ -50,8 +50,8 @@
- name: Unregister legacy queue
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage unregister_queue --queuename=tower"
when: "'[tower capacity=' in legacy_queue.stdout"
Expand All @@ -74,8 +74,8 @@
- name: Register default execution environments (without authentication)
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage register_default_execution_environments"
register: ree
Expand All @@ -95,8 +95,8 @@
- name: Register default execution environments (with authentication)
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage register_default_execution_environments
--registry-username='{{ default_execution_environment_pull_credentials_user }}'
Expand All @@ -111,8 +111,8 @@
- name: Create preload data if necessary. # noqa 305
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage create_preload_data"
register: cdo
Expand Down
46 changes: 4 additions & 42 deletions roles/installer/tasks/install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,51 +94,13 @@
- name: Include resources configuration tasks
include_tasks: resources_configuration.yml

- name: Check for pending migrations
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
command: >-
bash -c "awx-manage showmigrations | grep -v '[X]' | grep '[ ]' | wc -l"
changed_when: false
when: awx_task_pod_name != ''
register: database_check

- name: Migrate the database if the K8s resources were updated # noqa 305
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_task_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
command: |
bash -c "
function end_keepalive {
rc=$?
rm -f \"$1\"
kill $(cat /proc/$2/task/$2/children 2>/dev/null) 2>/dev/null || true
wait $2 || true
exit $rc
}
keepalive_file=\"$(mktemp)\"
while [[ -f \"$keepalive_file\" ]]; do
echo 'Database schema migration in progress...'
sleep 60
done &
keepalive_pid=$!
trap 'end_keepalive \"$keepalive_file\" \"$keepalive_pid\"' EXIT SIGINT SIGTERM
echo keepalive_pid: $keepalive_pid
awx-manage migrate --noinput
echo 'Successful'
"
register: migrate_result
when:
- awx_task_pod_name != ''
- database_check is defined
- (database_check.stdout|trim) != '0'
- name: Migrate database to the latest schema
include_tasks: migrate_schema.yml
when: awx_web_pod_name != ''

- name: Initialize Django
include_tasks: initialize_django.yml
when: awx_task_pod_name != ''
when: awx_web_pod_name != ''

- name: Update status variables
include_tasks: update_status.yml
Expand Down
57 changes: 57 additions & 0 deletions roles/installer/tasks/migrate_schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---

- name: Check for pending migrations
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage showmigrations | grep -v '[X]' | grep '[ ]' | wc -l"
changed_when: false
when: awx_web_pod_name != ''
register: database_check

- block:
- name: Get version of controller for tracking
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ awx_web_pod_name }}"
container: "{{ ansible_operator_meta.name }}-web"
command: >-
bash -c "awx-manage --version"
changed_when: false
register: version_check

- name: Update instance version
set_fact:
version: "{{ version_check.stdout | trim }}"

# It is possible to do a wait on this task to create the job and wait
# until it completes. Unfortunately, if the job doesn't wait finish within
# the timeout period that is considered an error. We only want this to
# error if there is an issue with creating the job.
- name: Create kubernetes job to perform the migration
k8s:
apply: yes
definition: "{{ lookup('template', 'jobs/migration.yaml.j2') }}"
register: migrate_result

# This task is really only necessary for new installations. We need to
# ensure the database has a schema loaded before continuing with the
# initialization of admin user, etc.
- name: Watch for the migration job to finish
k8s_info:
kind: Job
namespace: "{{ ansible_operator_meta.namespace }}"
name: "{{ ansible_operator_meta.name }}-migration-{{ version }}"
register: result
until:
- result.resources[0].status.succeeded is defined
- result.resources[0].status.succeeded == 1
retries: 180
delay: 5
ignore_errors: true

when:
- database_check is defined
- (database_check.stdout|trim) != '0'
Loading

0 comments on commit 259017a

Please sign in to comment.