Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid error when JTs deleted while task manager running #6084

Closed
wants to merge 2 commits into from

Conversation

AlanCoding
Copy link
Member

SUMMARY

This addresses an event that erred the task manager.

the task manager code should never throw errors.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
  • API
AWX VERSION
9.2.0
ADDITIONAL INFORMATION

addresses:

2020-02-26 02:11:48,729 ERROR    awx.main.dispatch PID:12916 Worker failed to run task awx.main.scheduler.tasks.run_task_manager(*[], **{}
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 164, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/fields/mixins.py", line 13, in get_cached_value
    return instance._state.fields_cache[cache_name]
KeyError: 'unified_job_template'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
    result = self.run_callable(body)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
    return _call(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/tasks.py", line 15, in run_task_manager
    TaskManager().schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 583, in schedule
    self._schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 557, in _schedule
    finished_wfjs = self.process_finished_workflow_jobs(running_workflow_tasks)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 167, in process_finished_workflow_jobs
    has_failed, reason = dag.has_workflow_failed()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/dag_workflow.py", line 153, in has_workflow_failed
    if obj.do_not_run is False and obj.unified_job_template is None:
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 178, in __get__
    rel_obj = self.get_object(instance)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 145, in get_object
    return qs.get(self.field.get_reverse_related_filter(instance))
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 408, in get
    self.model._meta.object_name
awx.main.models.unified_jobs.UnifiedJobTemplate.DoesNotExist: UnifiedJobTemplate matching query does not exist.

@AlanCoding
Copy link
Member Author

shoot, I should have realized how incomplete this was. It's hard to cover all cases, and fixing one error might just delay until another error. If I delete the job instead I get:

Traceback (most recent call last):
  File "/venv/awx/lib/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 164, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "/venv/awx/lib/python3.6/site-packages/django/db/models/fields/mixins.py", line 13, in get_cached_value
    return instance._state.fields_cache[cache_name]
KeyError: 'job'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/awx_devel/awx/main/tests/functional/models/test_workflow.py", line 439, in test_deleted_related_race_condition
    has_failed, reason = dag.has_workflow_failed()
  File "/awx_devel/awx/main/scheduler/dag_workflow.py", line 164, in has_workflow_failed
    elif obj.job and obj.job.status in ['failed', 'canceled', 'error']:
  File "/venv/awx/lib/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 178, in __get__
    rel_obj = self.get_object(instance)
  File "/venv/awx/lib/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 298, in get_object
    return super().get_object(instance)
  File "/venv/awx/lib/python3.6/site-packages/django/db/models/fields/related_descriptors.py", line 145, in get_object
    return qs.get(self.field.get_reverse_related_filter(instance))
  File "/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 408, in get
    self.model._meta.object_name
awx.main.models.unified_jobs.UnifiedJob.DoesNotExist: UnifiedJob matching query does not exist.

So I might try to push a fix that handles these a little better.

for workflow_node in workflow_nodes.all():
# Intentionally prefetch related jobs and templates so that if they
# are deleted while task manager runs, it will not cause an error
for workflow_node in workflow_nodes.prefetch_related('job', 'unified_job_template').all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea @AlanCoding I like this better. Good idea.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@AlanCoding
Copy link
Member Author

this is getting a little too hard to work out than what I have time for at the moment

@AlanCoding AlanCoding closed this Mar 18, 2020
AlanCoding pushed a commit to AlanCoding/awx that referenced this pull request Jan 4, 2023
We noticed here that openldap was getting downgraded and caused our test suite to blow up https://github.com/ansible/awx/runs/8118323342?check_suite_focus=true

Co-authored-by: Shane McDonald <me@shanemcd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants