control-service: data job synchronizer error handling #2742

mivanov1988 · 2023-10-02T13:23:26Z

Why
As part of VEP-2272, we need to introduce a process for synchronizing data jobs from the database to Kubernetes. In the event of a data job deployment failure (due to user or platform errors), the current implementation attempts to deploy the data job during every synchronization cycle.

What
We have added a data job deployment status field to the desired_data_job_deployment table. This status is used to determine whether the deployment failed in the previous cycle and should be skipped in the current one. The failed deployment status can be reset by invocation of updateDeployment API event without a job code change (it will be implemented as part of https://github.com/vmware/versatile-data-kit/pull/2731/files).

This is just the third phase of implementation. The following enhancements are planned for future PRs:

We will annotate the method DataJobsSynchronizer.synchronizeDataJobs() with @scheduled.
Improved exception handling will be integrated.
More tests will be included in subsequent updates.
ThreadPool configuration will be tuned and exposed through the application.properties.

Testing done
Unit tests

Signed-off-by: Miroslav Ivanov miroslavi@vmware.com

@scheduled

Why As part of the VEP-2272, we have to introduce a process to synchronize the data jobs from the database to the Kubernetes. The initial implementation depends on the data_job_deployment table for both reading and writing operations. As the deployment process operates asynchronously with unpredictable durations, users may believe that their deployment has been completed while it may not have. To address this issue, we'll be introducing a new table called "desired_data_job_deployment" and renaming the current one to "actual_data_job_deployment". The read operations will read from "actual_data_job_deployment" while the write operations will write to "desired_data_job_deployment". What We've implemented new deployment tables and modified the deployment logic to operate in conjunction with these new tables. Just so you know, this is just the second phase of implementation. The following enhancements are planned for future PRs: We will annotate the method DataJobsSynchronizer.synchronizeDataJobs() with @scheduled. Improved exception handling will be integrated. More tests will be included in subsequent updates. ThreadPool configuration will be tuned and exposed through the application.properties. Testing Done Integration tests Signed-off-by: Miroslav Ivanov miroslavi@vmware.com

Why As part of VEP-2272, we need to introduce a process for synchronizing data jobs from the database to Kubernetes. In the event of a data job deployment failure (due to user or platform errors), the current implementation attempts to deploy the data job during every synchronization cycle. What We have added a data job deployment status field to the desired_data_job_deployment table. This status is used to determine whether the deployment failed in the previous cycle and should be skipped in the current one. Testing done Unit tests Signed-off-by: Miroslav Ivanov miroslavi@vmware.com

dakodakov

LGTM

antoniivanov · 2023-10-04T16:01:25Z

...ines_control_service/src/main/java/com/vmware/taurus/service/deploy/DeploymentServiceV2.java

+    if (DeploymentStatus.USER_ERROR.equals(desiredJobDeployment.getStatus())
+        || DeploymentStatus.PLATFORM_ERROR.equals(desiredJobDeployment.getStatus())) {
+      log.debug(
+          "Skipping the data job [job_name={}] deployment due to the previously failed deployment"


Is this to prevent re-tries?

vmwclabot added the cla-not-required label Oct 2, 2023

github-actions bot added the title needs formatting label Oct 2, 2023

mivanov1988 changed the title ~~Person/miroslavi/data job synchronizer error handling~~ control-service: data job synchronizer error handling Oct 2, 2023

github-actions bot removed the title needs formatting label Oct 2, 2023

mivanov1988 force-pushed the person/miroslavi/data-job-synchronizer-error-handling branch from 8964d74 to 4f9ef0e Compare October 2, 2023 14:35

mivanov1988 mentioned this pull request Oct 2, 2023

control-service: utilize new deployment tables #2714

Merged

mivanov1988 and others added 2 commits October 2, 2023 21:51

Google Java Format

7ae16f5

mivanov1988 force-pushed the person/miroslavi/data-job-synchronizer-error-handling branch from 3094e98 to 5f90aa1 Compare October 2, 2023 18:53

mivanov1988 force-pushed the person/miroslavi/data-job-synchronizer-error-handling branch from 1c40bb8 to ab3c4b2 Compare October 3, 2023 07:14

Google Java Format

04a859c

mivanov1988 enabled auto-merge (squash) October 3, 2023 07:17

dakodakov approved these changes Oct 3, 2023

View reviewed changes

github-actions bot added 4 commits October 3, 2023 08:02

Merge main into person/miroslavi/data-job-synchronizer-error-handling

5549526

Merge main into person/miroslavi/data-job-synchronizer-error-handling

ecd9112

Merge main into person/miroslavi/data-job-synchronizer-error-handling

fc33426

Merge main into person/miroslavi/data-job-synchronizer-error-handling

de97c29

mrMoZ1 approved these changes Oct 3, 2023

View reviewed changes

mivanov1988 merged commit 87e3c70 into main Oct 3, 2023

mivanov1988 deleted the person/miroslavi/data-job-synchronizer-error-handling branch October 3, 2023 11:59

antoniivanov reviewed Oct 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

control-service: data job synchronizer error handling #2742

control-service: data job synchronizer error handling #2742

mivanov1988 commented Oct 2, 2023 •

edited

Loading

dakodakov left a comment

antoniivanov Oct 4, 2023

mivanov1988 Oct 9, 2023

control-service: data job synchronizer error handling #2742

control-service: data job synchronizer error handling #2742

Conversation

mivanov1988 commented Oct 2, 2023 • edited Loading

dakodakov left a comment

Choose a reason for hiding this comment

antoniivanov Oct 4, 2023

Choose a reason for hiding this comment

mivanov1988 Oct 9, 2023

Choose a reason for hiding this comment

mivanov1988 commented Oct 2, 2023 •

edited

Loading