-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImagePullBackOff: Failed to pull image "mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.5.3" #1513
Comments
Hi @sanjeebsarangi - thanks for following up, as noted in email, this is the same issue as #1373 - the imagepullbackoff is triggered by the IOPS load of the container image pulls on reboots and node failovers. |
@jnoller is there a plan to address this in some way? It seems like a severe limitation if we cannot expect nodes to restart successfully |
Action required from @Azure/aks-pm |
Any update on this? I have clusters provisioned using Standard_D4as_v4 nodes and 256GB premium ssds and am seeing this issue when provisioning aad-pod-identity with Terraform. @jnoller What is the expected IOPS load of the container image pull? |
Could this be related to using remote network disks for the OS nodes? Would Microsoft recommend using vms in the Kubernetes node pool which support local premium disks, rather then relying on remote storage? |
@palma21 I think it's related as @jnoller noted in: #1373 where it's noted that disk io saturation and throttling cause the cluster dns to fail and yes I created a ticket for this issue 120080724005711. I'm testing out Velero before rolling out this change to our additional Azure subscriptions/ aks clusters. |
Your case might be as I don't see your error, the OP case is not, the daemon is working fine and not throttled but not being able to find the registry IP. That doesn't even use cluster DNS. 168.63.129.16 is Azure DNS. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This issue will now be closed because it hasn't had any activity for 15 days after stale. sanjeebsarangi feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion. |
What happened:
ErrImagePull error for for NMI pod after node reboot.
What you expected to happen:
All pod should come back to running state after rebooting one or more AKS node.
How to reproduce it (as minimally and precisely as possible):
Rebooting an AKS node.
Anything else we need to know?:
We have 7 node cluster and we rebooted a node from Azure Console. NMI pod is not coming up with error below. This also happened with another pod in ACR in our own subscription.
`Events:
Type Reason Age From Message
Warning FailedCreatePodSandBox 45m (x71 over 61m) kubelet, aks-fastcompute-25023122-vmss000002 Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "mcr.microsoft.com/k8s/core/pause:1.2.0": Error response from daemon: Get https://mcr.microsoft.com/v2/: dial tcp: lookup mcr.microsoft.com on 168.63.129.16:53: no such host
Normal SandboxChanged 39m kubelet, aks-fastcompute-25023122-vmss000002 Pod sandbox changed, it will be killed and re-created.
Warning Failed 38m (x3 over 39m) kubelet, aks-fastcompute-25023122-vmss000002 Failed to pull image "mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.5.3": rpc error: code = Unknown desc = Error response from daemon: Get https://mcr.microsoft.com/v2/: dial tcp: lookup mcr.microsoft.com on 168.63.129.16:53: no such host
Warning Failed 38m (x3 over 39m) kubelet, aks-fastcompute-25023122-vmss000002 Error: ErrImagePull
Warning Failed 37m (x7 over 39m) kubelet, aks-fastcompute-25023122-vmss000002 Error: ImagePullBackOff
Normal Pulling 37m (x4 over 39m) kubelet, aks-fastcompute-25023122-vmss000002 Pulling image "mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.5.3"
Normal BackOff 3m52s (x157 over 39m) kubelet, aks-fastcompute-25023122-vmss000002 Back-off pulling image "mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.5.3"
`
Environment:
The text was updated successfully, but these errors were encountered: