Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface image pull errors in service status #4192

Closed
jonjohnsonjr opened this issue May 30, 2019 · 2 comments · Fixed by #4195
Closed

Surface image pull errors in service status #4192

jonjohnsonjr opened this issue May 30, 2019 · 2 comments · Fixed by #4195
Assignees
Labels
area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug.

Comments

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented May 30, 2019

In what area(s)?

/area API

What version of Knative?

HEAD

Expected Behavior

We should provide some indication that ImagePullBackoff is happening when trying to deploy a service.

Actual Behavior

In the revision, everything is Unknown because we are deploying.

status:
  conditions:
  - lastTransitionTime: "2019-05-29T22:48:43Z"
    message: Requests to the target are being buffered as resources are provisioned.
    reason: Queued
    severity: Info
    status: Unknown
    type: Active
  - lastTransitionTime: "2019-05-29T22:48:43Z"
    status: "True"
    type: BuildSucceeded
  - lastTransitionTime: "2019-05-29T22:48:43Z"
    reason: Deploying
    status: Unknown
    type: ContainerHealthy
  - lastTransitionTime: "2019-05-29T22:48:43Z"
    reason: Deploying
    status: Unknown
    type: Ready
  - lastTransitionTime: "2019-05-29T22:48:43Z"
    reason: Deploying
    status: Unknown
    type: ResourcesAvailable

In the service, we are just waiting for the revision to become Ready.

Interestingly, we know that the deployment is not Progressing:

deployment:
  status:
    conditions:
    - lastTransitionTime: "2019-05-29T22:48:43Z"
      lastUpdateTime: "2019-05-29T22:48:43Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2019-05-29T22:50:44Z"
      lastUpdateTime: "2019-05-29T22:50:44Z"
      message: ReplicaSet "autoscale-go-d8flr-deployment-7bc5c75ff4" has timed out progressing.
      reason: ProgressDeadlineExceeded
      status: "False"
      type: Progressing
    observedGeneration: 1
    replicas: 1
    unavailableReplicas: 1
    updatedReplicas: 1

But because the revision's Active condition is Unknown:

func (rs *RevisionStatus) IsActivationRequired() bool {
if c := revCondSet.Manage(rs).GetCondition(RevisionConditionActive); c != nil {
return c.Status != corev1.ConditionTrue
}
return false

We assume activation is required, so we don't propagate the deployment ProgressDeadlineExceeded condition:

if hasDeploymentTimedOut(deployment) && !rev.Status.IsActivationRequired() {

That seems like it might be a separate bug?

Regardless, in this case, we can check to see if the pod's user-container is in state waiting and surface an error if the deployment has also timed out with ProgressDeadlineExceeded:

pod:
  status:
    containerStatuses:
    - image: gcr.io/jonjohnson-test/autoscale-go@sha256:e5e89c5fd57c717b49d41be89faebc526bdcda017e898ae86c2bf20f5cd339b5
      imageID: ""
      lastState: {}
      name: user-container
      ready: false
      restartCount: 0
      state:
        waiting:
          message: Back-off pulling image "gcr.io/jonjohnson-test/autoscale-go@sha256:e5e89c5fd57c717b49d41be89faebc526bdcda017e898ae86c2bf20f5cd339b5"
          reason: ImagePullBackOff
    hostIP: 10.128.0.50
    phase: Pending
    podIP: 10.60.0.9
    qosClass: Burstable
    startTime: "2019-05-29T22:48:43Z"

Steps to Reproduce the Problem

Deploy an ksvc with a non-existent image by (valid) digest (so tag -> digest resolution will skip over it). The pod will never become ready because it can't pull the image.

@jonjohnsonjr jonjohnsonjr added the kind/bug Categorizes issue or PR as related to a bug. label May 30, 2019
@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label May 30, 2019
@jonjohnsonjr
Copy link
Contributor Author

jonjohnsonjr commented May 30, 2019

cc @mattmoor seems like in general we're struggling to surface failed deployments due to the !IsActivationRequired check

@jonjohnsonjr
Copy link
Contributor Author

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants