Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow is stuck in pending state if K8S API connection issue #3600

Closed
4 tasks
sarabala1979 opened this issue Jul 26, 2020 · 1 comment · Fixed by #4253
Closed
4 tasks

Workflow is stuck in pending state if K8S API connection issue #3600

sarabala1979 opened this issue Jul 26, 2020 · 1 comment · Fixed by #4253
Labels

Comments

@sarabala1979
Copy link
Member

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:
If the entry template has any K8S API exception. Workflow stays pending state.

canary-bnchp   Pending     3d    0s         0
time="2020-07-23T18:12:09Z" level=info msg="Failed to create pod canary-bnchp (canary-bnchp): Post https://100.64.0.1:443/api/v1/namespaces/argo/pods: unexpected EOF" namespace=argo workflow=canary-bnchp
time="2020-07-23T18:12:09Z" level=error msg="Mark error node" error="Post https://100.64.0.1:443/api/v1/namespaces/argo/pods: unexpected EOF" namespace=argo nodeName=canary-bnchp workflow=canary-bnchp
time="2020-07-23T18:12:09Z" level=info msg="node canary-bnchp phase Pending -> Error" namespace=argo workflow=canary-bnchp
time="2020-07-23T18:12:09Z" level=info msg="node canary-bnchp message: Post https://100.64.0.1:443/api/v1/namespaces/argo/pods: unexpected EOF" namespace=argo workflow=canary-bnchp
time="2020-07-23T18:12:09Z" level=info msg="node canary-bnchp finished: 2020-07-23 18:12:09.778181234 +0000 UTC" namespace=argo workflow=canary-bnchp
time="2020-07-23T18:12:09Z" level=error msg="error in entry template execution" error="Post https://100.64.0.1:443/api/v1/namespaces/argo/pods: unexpected EOF" namespace=argo workflow=canary-bnchp
E0723 18:12:09.778845       1 reflector.go:283] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to watch *unstructured.Unstructured: Get https://100.64.0.1:443/apis/argoproj.io/v1alpha1/workflows?labelSelector=%21workflows.argoproj.io%2Fcontroller-instanceid%2Cworkflows.argoproj.io%2Fcron-workflow&resourceVersion=4352375166&timeoutSeconds=456&watch=true: dial tcp 100.64.0.1:443: connect: connection refused
E0723 18:12:09.778856       1 reflector.go:283] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to watch *v1alpha1.ClusterWorkflowTemplate: Get https://100.64.0.1:443/apis/argoproj.io/v1alpha1/clusterworkflowtemplates?resourceVersion=4338389279&timeout=5m22s&timeoutSeconds=322&watch=true: dial tcp 100.64.0.1:443: connect: connection refused
E0723 18:12:09.778863       1 reflector.go:283] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to watch *v1.Pod: Get https://100.64.0.1:443/api/v1/pods?labelSelector=workflows.argoproj.io%2Fcompleted%3Dfalse%2C%21workflows.argoproj.io%2Fcontroller-instanceid&resourceVersion=4352369138&timeoutSeconds=337&watch=true: dial tcp 100.64.0.1:443: connect: connection refused
E0723 18:12:09.778866       1 reflector.go:283] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to watch *unstructured.Unstructured: Get https://100.64.0.1:443/apis/argoproj.io/v1alpha1/workflows?labelSelector=%21workflows.argoproj.io%2Fcontroller-instanceid&resourceVersion=4352375166&timeoutSeconds=541&watch=true: dial tcp 100.64.0.1:443: connect: connection refused

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Argo version:
$ argo version
  • Kubernetes version :
$ kubectl version -o yaml

Other debugging information (if applicable):

  • workflow result:
argo --loglevel DEBUG get <workflowname>
  • executor logs:
kubectl logs <failedpodname> -c init
kubectl logs <failedpodname> -c wait
  • workflow-controller logs:
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

Logs

argo get <workflowname>
kubectl logs <failedpodname> -c init
kubectl logs <failedpodname> -c wait
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@sarabala1979 sarabala1979 self-assigned this Jul 26, 2020
@sarabala1979 sarabala1979 changed the title Workflow is stuck in pending state if entry template is not able to create pod Workflow is stuck in pending state if K8S API connection issue Jul 26, 2020
@sarabala1979 sarabala1979 removed their assignment Jul 27, 2020
@jessesuen
Copy link
Member

Possibly related to resourceVersion equality check during Update handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants