Consume job_explanation we sometimes set in ansible-runner #12089

shanemcd · 2022-04-22T11:45:55Z

Replaces / updates #11376

AlanCoding · 2022-04-22T13:19:51Z

Regarding the part of this that I wrote, what data would this get us?

https://github.com/ansible/ansible-runner/blob/d01304455c3c25669fba70d9983487ffdb85f1c5/ansible_runner/streaming.py#L101

Failed to JSON parse a line from transmit stream

Failed to extract private data directory on worker

These are pretty vague messages. If I got them by themselves, I wouldn't know what to do. I'm not against surfacing them to the user, indeed, I think that's a very good idea. However, this save will overwrite any job_explanation that was set in another part of the code or by another process. This case, in particular, is non-exclusive with getting additional error details. The most highly valued error details tends to get what comes from AWXReceptorJob, reading details from the work unit, or the work unit output.

Because of that, I have cold feet about doing this without stacking it on top of #11832, which fixed tests that were actively failing (we just forget because they get set to skip). If we can get this in a way that stacks on top of any other error details, then that's the ideal, that would be perfect.

shanemcd · 2022-04-22T13:32:29Z

Right now jobs fail with an even more vague explanation: Job terminated due to error.

This at least tells us the last thing that went wrong.

awx/main/tasks/jobs.py

AlanCoding · 2022-05-03T15:33:19Z

awx/main/tasks/jobs.py

+            # We call this to get the current values from the database, in case update_model was called
+            # within the threadpools inside of AWXReceptorJob. We use update_model instead of
+            # refresh_from_db here because it contains retry logic that is resilient to database failures.
+            self.instance = self.update_model(self.instance.pk)


Ping @sarabrajsingh, as he and I discussed error logs that came from post_run_hook due to not doing this.

Summarizing here...

Each thread has its own connection object. When connections are dropped, that object is stale. If it's not hitting one particular corner case, it will correct itself after 1 failed query. So post_run_hook errors, but then code after it starts to work again, so it's not easy to observe unless you check the logs.

AlanCoding · 2022-05-03T19:06:23Z

I've modified adjacent code, so here's what I think a rebase should look like:

https://github.com/ansible/awx/compare/devel...AlanCoding:capture-runner-error?expand=1

EDIT: to resolve conflicts.

AlanCoding · 2023-04-13T17:15:15Z

A replacement PR exists

github-actions bot added the component:api label Apr 22, 2022

shanemcd mentioned this pull request Apr 22, 2022

Consume job_explanation we sometimes set in ansible-runner #11376

Closed

AlanCoding reviewed Apr 22, 2022

View reviewed changes

awx/main/tasks/jobs.py Show resolved Hide resolved

shanemcd force-pushed the capture-runner-error branch from 17f0627 to aad5fb1 Compare April 22, 2022 14:28

AlanCoding approved these changes Apr 22, 2022

View reviewed changes

shanemcd and others added 2 commits April 22, 2022 10:51

Fetch job status from database to respect updates made in threads

567ca0e

Consume job_explanation we sometimes set in ansible-runner

f4e8beb

shanemcd force-pushed the capture-runner-error branch from aad5fb1 to f4e8beb Compare April 22, 2022 14:51

jbradberry approved these changes Apr 29, 2022

View reviewed changes

AlanCoding reviewed May 3, 2022

View reviewed changes

AlanCoding mentioned this pull request Sep 2, 2022

Job error without log #12297

Open

6 tasks

AlanCoding mentioned this pull request Jan 27, 2023

Consume job_explanation from runner, fix error reporting error #13482

Merged

AlanCoding closed this Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume job_explanation we sometimes set in ansible-runner #12089

Consume job_explanation we sometimes set in ansible-runner #12089

shanemcd commented Apr 22, 2022

AlanCoding commented Apr 22, 2022

shanemcd commented Apr 22, 2022

AlanCoding May 3, 2022

AlanCoding commented May 3, 2022 •

edited

Loading

AlanCoding commented Apr 13, 2023

Consume job_explanation we sometimes set in ansible-runner #12089

Consume job_explanation we sometimes set in ansible-runner #12089

Conversation

shanemcd commented Apr 22, 2022

AlanCoding commented Apr 22, 2022

shanemcd commented Apr 22, 2022

AlanCoding May 3, 2022

Choose a reason for hiding this comment

AlanCoding commented May 3, 2022 • edited Loading

AlanCoding commented Apr 13, 2023

AlanCoding commented May 3, 2022 •

edited

Loading