Handle Jobs with ttl_seconds_after_finished = 0 correctly #2596

JaylonmcShan03 · 2024-10-01T15:08:21Z

Description

Fixes #2531
When a Kubernetes Job has ttl_seconds_after_finished set to 0, Kubernetes deletes the Job immediately after it completes. This can cause Terraform to plan the recreation of the Job in subsequent runs, which is sort of undesirable behavior. The expected behavior is for Terraform to recognize that the Job was deleted intentionally and not attempt to recreate it or show any diffs.

Changes made
- Read function:
- When the read crud func encounters a NotFound error for a job, it now checks if ttl is set to 0, if so it keeps the
resource in the state without marking it as destroyed, this helps prevent the recreation of the job. If it is not 0, it
behaves as before by removing the resource from the state.

CustomizeDiff function introduced:
- Supressing diffs when the job has been deleted due to due to ttl = 0

Acceptance tests

Have you added an acceptance test for the functionality being added?
Have you run the acceptance tests on this branch?

Output from acceptance testing:

└─(10:13:50 on fix-job-ttl-zero-handling)──> make test TESTARGS="-run TestAccKubernetesJobV1_customizeDiff_ttlZero"
==> Checking that code complies with gofmt requirements...
go vet ./...
go test "/Users/mau/Dev/terraform-provider-kubernetes/kubernetes" -vet=off -run TestAccKubernetesJobV1_customizeDiff_ttlZero -parallel 8 -timeout=30s
ok      github.com/hashicorp/terraform-provider-kubernetes/kubernetes   0.924s
...

Release Note

Release note for CHANGELOG:

Properly handle Kubernetes Jobs with ttl_seconds_after_finished = 0 to prevent unnecessary recreation.

References

Fixes #2531

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

BBBmau · 2024-10-10T17:31:57Z

kubernetes/resource_kubernetes_job_v1_test.go

@@ -516,3 +550,28 @@ func testAccKubernetesJobV1Config_modified(name, imageName string) string {
  wait_for_completion = false
 }`, name, imageName)
 }
+
+func testAccKubernetesJobV1Config_customizeDiff_ttlZero(name, imageName string) string {


When applying the tfconfig manually it works as expected. The apply goes through with no diff occurring.

I did however attempt to update ttl_seconds_after_finished from 0 to 5 and got the following error:

(base) ┌─(~/Dev/Scratch/ttl_test)────────────────────────────────────(mau@mau-JKDT676NCP:s017)─┐ └─(10:16:32)──> tfa ──(Thu,Oct10)─┘ kubernetes_job_v1.test: Refreshing state... [id=default/ttl-test] Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ~ update in-place Terraform will perform the following actions: # kubernetes_job_v1.test will be updated in-place ~ resource "kubernetes_job_v1" "test" { id = "default/ttl-test" # (1 unchanged attribute hidden) ~ spec { ~ ttl_seconds_after_finished = "0" -> "5" # (8 unchanged attributes hidden) # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } Plan: 0 to add, 1 to change, 0 to destroy. kubernetes_job_v1.test: Modifying... [id=default/ttl-test] ╷ │ Error: Failed to update Job! API error: jobs.batch "ttl-test" not found │ │ with kubernetes_job_v1.test, │ on main.tf line 1, in resource "kubernetes_job_v1" "test": │ 1: resource "kubernetes_job_v1" "test" {

From my understanding we're wanting to prevent this error from happening when set to 0 which has been achieved. But in doing so we are now unable to update the existing job that's in tfstate assuming we want to update the already existing job that's part of state despite it having a ttl of 0.

We'll want to consider how to solve this since if this is left users will need to destroy every job that has ttl_seconds_after_finished = 0 if wanting to apply an update to the job already existing in state.

Correct, initially the issue was when ttl_seconds_after_finished = 0, the job deletes, and we don't want terraform to recreate the job in the next apply.

I modified the update function, first I attempt to get the job, if a NotFound error occurs, we proceed to check the previous ttl_seconds_after_finished, I then check if ttl = 0. If so set the resource id to "" to remove it from the state, and with my understanding I thought terraform would recreate it.

But when attempting that solution, I get this error produced an unexpected new value: Root object was present, but now absent.
And I believe that's due to, during an update terraform expects the resource to remain in the state unless explicitly destroyed by the config.

I will give it some more thought today, with that being said do you have another idea in mind? Is this ttl solution viable, meaning with us solving one issue, another edge case arises.

Handle Jobs with ttl_seconds_after_finished = 0 correctly

1fef571

JaylonmcShan03 added enhancement size/S labels Oct 1, 2024

JaylonmcShan03 requested a review from a team as a code owner October 1, 2024 15:08

github-actions bot added size/M and removed size/S labels Oct 1, 2024

Adding changelog

49c28bd

BBBmau reviewed Oct 10, 2024

View reviewed changes

Fixed edge case to where we update ttl from 0 to another value

ff6c89b

github-actions bot added size/L and removed size/M labels Oct 11, 2024

Handle TTL value changes for Job recreation in Kubernetes

bc239a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Jobs with ttl_seconds_after_finished = 0 correctly #2596

Handle Jobs with ttl_seconds_after_finished = 0 correctly #2596

JaylonmcShan03 commented Oct 1, 2024 •

edited by BBBmau

Loading

BBBmau Oct 10, 2024

JaylonmcShan03 Oct 11, 2024

Handle Jobs with ttl_seconds_after_finished = 0 correctly #2596

Are you sure you want to change the base?

Handle Jobs with ttl_seconds_after_finished = 0 correctly #2596

Conversation

JaylonmcShan03 commented Oct 1, 2024 • edited by BBBmau Loading

Description

Acceptance tests

Release Note

References

Community Note

BBBmau Oct 10, 2024

Choose a reason for hiding this comment

JaylonmcShan03 Oct 11, 2024

Choose a reason for hiding this comment

JaylonmcShan03 commented Oct 1, 2024 •

edited by BBBmau

Loading