Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve-#13045] after a submit failure, stop the processInstance to avoid an endless loop #13051

Merged
merged 10 commits into from
Feb 20, 2023

Conversation

fuchanghai
Copy link
Member

Purpose of the pull request

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

this close: #13045

@SbloodyS SbloodyS added improvement make more easy to user or prompt friendly 3.2.0 for 3.2.0 version labels Nov 30, 2022
@SbloodyS SbloodyS added this to the 3.2.0 milestone Nov 30, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 15, 2022

Codecov Report

Merging #13051 (b6aaf01) into dev (133d173) will decrease coverage by 0.09%.
The diff coverage is 10.28%.

❗ Current head b6aaf01 differs from pull request most recent head 0ce6b70. Consider uploading reports for the commit 0ce6b70 to get more accurate results

@@             Coverage Diff              @@
##                dev   #13051      +/-   ##
============================================
- Coverage     39.57%   39.48%   -0.09%     
- Complexity     4375     4381       +6     
============================================
  Files          1097     1102       +5     
  Lines         41288    41453     +165     
  Branches       4723     4740      +17     
============================================
+ Hits          16340    16368      +28     
- Misses        23133    23265     +132     
- Partials       1815     1820       +5     
Impacted Files Coverage Δ
...e/dolphinscheduler/common/constants/Constants.java 75.00% <ø> (ø)
.../dolphinscheduler/common/enums/StateEventType.java 0.00% <0.00%> (ø)
...r/common/log/remote/RemoteLogHandleThreadPool.java 0.00% <0.00%> (ø)
...ler/common/log/remote/RemoteLogHandlerFactory.java 0.00% <0.00%> (ø)
...nscheduler/common/log/remote/RemoteLogService.java 0.00% <0.00%> (ø)
...hinscheduler/common/log/remote/RemoteLogUtils.java 0.00% <0.00%> (ø)
...ler/server/master/event/TaskStateEventHandler.java 0.00% <0.00%> (ø)
...server/master/event/WorkflowStartEventHandler.java 0.00% <0.00%> (ø)
...server/master/event/WorkflowStateEventHandler.java 0.00% <0.00%> (ø)
.../server/master/runner/WorkflowExecuteRunnable.java 10.09% <0.00%> (-0.06%) ⬇️
... and 12 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@fuchanghai fuchanghai closed this Dec 15, 2022
@fuchanghai fuchanghai reopened this Dec 15, 2022
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@fuchanghai
Copy link
Member Author

cc @ruanwenjun

@davidzollo
Copy link
Contributor

@caishunfeng please take a look , thx

Comment on lines 745 to 749
if (e instanceof SQLException) {
return WorkflowSubmitStatue.FAILED;
} else {
return WorkflowSubmitStatue.ERROR;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not to distinguish these two types of error, rather, DB failure will keep writing logs.

@fuchanghai
Copy link
Member Author

fuchanghai commented Feb 9, 2023 via email

@Radeity
Copy link
Member

Radeity commented Feb 9, 2023

if exception is not type of sqlexception。master will print log until have not enough space

I mean you don't have to add Error state, whether there's a db error or other types of error, they can both update the count number.

@fuchanghai
Copy link
Member Author

fuchanghai commented Feb 9, 2023 via email

@Radeity
Copy link
Member

Radeity commented Feb 9, 2023

ruanwenjun recommend that if exception is not type of sqlexception, we should reduce the count, because we can not to deal with it.

I've noticed that, i think he just reminds that if you update db, it will throw exception, rather recommend you ignore sql exception.

@fuchanghai
Copy link
Member Author

I've noticed that, i think he just reminds that if you update db, it will throw exception, rather recommend you ignore sql exception.

sql exception maybe caused by db crash which is can be resumed ,so I have to keep trying again ,and don't need reduce count

@Radeity
Copy link
Member

Radeity commented Feb 10, 2023

sql exception maybe caused by db crash which is can be resumed ,so I have to keep trying again ,and don't need reduce count

I don't think so. Now that we allow task to retry, we just assume that they are possible to return to normal, db crash or network failure, etc. If the task can not recover, we can directly stop it and don't even have to keep this count. SO, we set this count number for two reasons:

  • If the task is unrecoverable, we stop it because more retries are meaningless.
  • If the task is recoverable, although it will recover one day with more retries, we can not wait for it because of huge amount of log output.

@fuchanghai fuchanghai closed this Feb 16, 2023
@fuchanghai fuchanghai reopened this Feb 16, 2023
@fuchanghai fuchanghai requested review from caishunfeng and removed request for SbloodyS and ruanwenjun February 20, 2023 01:38
@fuchanghai
Copy link
Member Author

@caishunfeng @ruanwenjun PTAL

@fuchanghai fuchanghai changed the title [improve-#13045] add max submit number of workflow [improve-#13045] after a submit failure, stop the processInstance to avoid an endless loop Feb 20, 2023
Co-authored-by: Wenjun Ruan <wenjun@apache.org>
Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@caishunfeng caishunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@caishunfeng caishunfeng merged commit 701d67c into apache:dev Feb 20, 2023
1 check passed
zhuangchong pushed a commit to zhuangchong/incubator-dolphinscheduler that referenced this pull request Sep 19, 2023
@zhuangchong zhuangchong modified the milestones: 3.2.0, 3.1.9 Sep 19, 2023
@zhuangchong zhuangchong added release cherry-pick 3.1.x 3.1.x for 3.1.x version and removed 3.2.0 for 3.2.0 version labels Sep 19, 2023
zhuangchong pushed a commit that referenced this pull request Dec 6, 2023
@zhuangchong zhuangchong added the release cherry-pick Mark this issue/PR had cherry-pick for release version label Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.1.x for 3.1.x version backend improvement make more easy to user or prompt friendly release cherry-pick Mark this issue/PR had cherry-pick for release version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement][Master] add max submit times of workflow
8 participants