Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job stuck in AWAITING_COMPLETION #4177

Closed
pintify opened this issue Jan 22, 2025 · 1 comment
Closed

Job stuck in AWAITING_COMPLETION #4177

pintify opened this issue Jan 22, 2025 · 1 comment

Comments

@pintify
Copy link

pintify commented Jan 22, 2025

Describe the bug
I was having issues with some devices that took too long to process a batch job request. The devices answered around 40 seconds after receiving the request. To solve this, I increased the timeout of the step to 60 seconds. However, now the response reaches Kapua within the timeout, avoiding the PROCESS_FAILURE status, but never reaches PROCESS_OK. The step gets stuck in AWAITING_COMPLETION

To Reproduce
Steps to reproduce the behavior:

  1. Create a batch job and step with a large timeout (40 or 60 seconds)
  2. Execute the job
  3. The step enters PROCESS_AWAITING status
  4. Delay the response from the device over the 30 seconds mark
  5. The batch job will go to AWAITING_COMPLETION but never PROCESS_OK

Expected behavior
The job, and step, should progress towards PROCESS_OK

Version of Kapua
1.6.12

Type of deployment
[x] Docker

Additional context
According to the documentation on the source, the status seem to be working wrong all along:

Image

If I understand right, the status sequence should be:

  1. Start the job
  2. PROCESS_AWAITING
  3. Send request to device
  4. AWAITING_COMPLETION
  5. Receive response from device
  6. NOTIFIED_COMPLETION
  7. Process response and select the next status
  8. PROCESS_OK or PROCESS_FAILURE

Alternatively, if the device never answers, the status changes from AWAITING_COMPLETION to PROCESS_FAILURE

However, the sequence actually seems to be:

  1. Start the job
  2. PROCESS_AWAITING
  3. Send request to device
  4. Still PROCESS_AWAITING
  5. Receive response from device
  6. AWAITING_COMPLETION
  7. _Process response and select the next status
  8. PROCESS_OK or PROCESS_FAILURE

Or something similar. Also confusingly affected by the modification of the timeout

Is there any place where the workflow is clearly documented and defined?

@pintify
Copy link
Author

pintify commented Jan 22, 2025

Never mind the initial error, it seems that it was due to be a deploy/download request, which waited for the device to execute that additional task. It may be useful some documentation about it though

@pintify pintify closed this as completed Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant