Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ict 4483 - add job attach to wait for job execution #123

Merged
merged 11 commits into from
Jun 14, 2024
Merged

ict 4483 - add job attach to wait for job execution #123

merged 11 commits into from
Jun 14, 2024

Conversation

davixcky
Copy link
Contributor

@davixcky davixcky commented Jun 5, 2024

Copy link
Contributor

@jmsktm jmsktm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. But the it seemed connected / attached for a while after the job was completed before disconnecting with the below message:

./signadot --config ~/.signadot/staging.yaml job submit -f control-backend-e2e.yaml --attach
Job control-backend-e2e-ptc0dxq queued on Job Runner Group: control-e2e

Dashboard page: https://app.signadot.com/testing/jobs/control-backend-e2e-ptc0dxq

Waiting for job execution
Cloning signadot repo
Running backend e2e tests
newman

Latest-Operator-Version

→ latest-operator-version (member user)
  GET https://api.staging.signadot.com/api/v1/vars/latest-operator-version
  200 OK ★ 521ms time ★ 1.13kB↑ 306B↓ size ★ 9↑ 7↓ headers ★ 0 cookies
  ┌ ↓ application/json ★ text ★ json ★ utf8 ★ 94B
...
logs
...

&{%!s(*models.JobsCanceledState=<nil>) %!s(*models.JobsCompletedState=&{0 }) %!s(*models.JobsQueuedState=<nil>) %!s(*models.JobsRunningState=<nil>)}

jmsktm
jmsktm previously approved these changes Jun 5, 2024
Copy link
Contributor

@jmsktm jmsktm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the connection persist beyond job completion with the logs command too.

./signadot --config ~/.signadot/staging.yaml logs --job control-backend-e2e-wmmx22i

Even in this case, the connection stayed intact for a while (maybe 20/30 seconds) after the job completed.

Since this seems to be external to the scope of this PR, I will go ahead and approve it.
This could be directly or indirectly related to https://github.com/signadot/signadot/issues/4484 as well. But let's create a separate ticket for the cli so that it doesn't get forgotten.

@jmsktm jmsktm dismissed their stale review June 5, 2024 20:52

Dismissing review based on some comments by Daniel.

@daniel-de-vera
Copy link
Contributor

daniel-de-vera commented Jun 5, 2024

I think the --attach feature should be more robust (at least if we want to use it in the context of CI). Something that in case of errors it retries based on exp backoff (or luby), and only after some time it fails. Otherwise we may end up generating false negatives (claiming that a job failed when it didn't).

For example, imagine the case when we restart the apiserver (e.g. we perform a new control release), all connected clients will error out, but that doesn't mean the Job failed.

@davixcky davixcky changed the title ict 4483 ict 4483 - add job attach to wait for job execution Jun 11, 2024
@davixcky davixcky requested a review from daniel-de-vera June 14, 2024 15:59
Copy link
Contributor

@daniel-de-vera daniel-de-vera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davixcky davixcky merged commit 715bfcf into main Jun 14, 2024
@davixcky davixcky deleted the ict-4483 branch June 14, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants