add restart policy & scheduler name for workflow pods #1109

houz42 · 2018-12-01T07:19:06Z

RestartPolicy and SchedulerName are useful for controlling pods running of workflow.

alexmt

I agree we should provide ability to specify scheduler name and restart policy but on step level not whole workflow. User would have to repeat settings but this problem should be solved as part of #799

@jessesuen , need your opinion about restart policy . First I thought we don't need it since RetryStrategy is available. After some thinking I've decided it is useful. User might want to chose pod restart policy to make sure retry happens on the same node.

pkg/apis/workflow/v1alpha1/types.go

houz42 · 2019-01-12T13:18:19Z

@alexmt

scheduler name and restart policy have been made step level
restart policy has been restricted to Never and OnFailure

jessesuen · 2019-01-16T20:34:35Z

@houz42 I think the scheduler is a fine addition.

However a restartPolicy of OnFailure is problematic to set because restartPolicy is a pod spec level setting, and it will apply to the wait sidecar as well. The current design is that the controller relies on the fact the wait sidecar will exit with non-zero in many situations to understand the status of the step. For example, the entire wait logic will return non-zero if any of the following goes wrong:

artifact loading
log retrieval
output parameter retrieval
artifact saving
output annotation
wait k8s resource reached failure condition

In order to support a restartPolicy of OnFailure, we would need to modify the executor to always exit zero, and communicate back to the controller, that a step had failed in a different way. It's unclear what this mechanism would be.

One thought is: we currently already use pod annotations to communicate error messages to the controller. The controller could be modified such that it always expects a pod annotation to be set (even on success). Then, if the pod completed without setting an annotation, then something went wrong and the controller could fail the step.

So as it stands, supporting restartPolicy: OnFailure can't go in without:

changing behavior of argoexec to always exit 0
replacing the current exit 1 error communication mechanism with something else.

houz42 · 2019-01-17T02:14:23Z

@jessesuen maybe I should submit changes on schedulerName first and consider on restartPolicy later.

jessesuen · 2019-01-17T07:30:48Z

Yes, scheduler only changes would be fine.

houz42 · 2019-01-23T01:54:48Z

add scheduler name only in #1184

* chore: deprecate in v1.5 comments Signed-off-by: Derek Wang <whynowy@gmail.com>

alexmt requested changes Jan 10, 2019

View reviewed changes

pkg/apis/workflow/v1alpha1/types.go Outdated Show resolved Hide resolved

pkg/apis/workflow/v1alpha1/types.go Outdated Show resolved Hide resolved

Hou Junjie added 5 commits January 12, 2019 18:00

add restart policy & scheduler name for workflow pods

c6df90d

keep back compatible with

9c7d440

update openapi

187affa

Revert changes on restartPolicy and schedulerName

bdb7763

config scheduler name and restart policy on step level

5553045

houz42 mentioned this pull request Jan 20, 2019

add schedulerName #1184

Merged

houz42 closed this Jan 23, 2019

icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this pull request Jan 5, 2022

chore: deprecation in v1.5 comments (argoproj#1109)

970d355

* chore: deprecate in v1.5 comments Signed-off-by: Derek Wang <whynowy@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add restart policy & scheduler name for workflow pods #1109

add restart policy & scheduler name for workflow pods #1109

houz42 commented Dec 1, 2018 •

edited

Loading

alexmt left a comment

houz42 commented Jan 12, 2019

jessesuen commented Jan 16, 2019 •

edited

Loading

houz42 commented Jan 17, 2019

jessesuen commented Jan 17, 2019

houz42 commented Jan 23, 2019

add restart policy & scheduler name for workflow pods #1109

add restart policy & scheduler name for workflow pods #1109

Conversation

houz42 commented Dec 1, 2018 • edited Loading

alexmt left a comment

Choose a reason for hiding this comment

houz42 commented Jan 12, 2019

jessesuen commented Jan 16, 2019 • edited Loading

houz42 commented Jan 17, 2019

jessesuen commented Jan 17, 2019

houz42 commented Jan 23, 2019

houz42 commented Dec 1, 2018 •

edited

Loading

jessesuen commented Jan 16, 2019 •

edited

Loading