-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarf-Seed-Registry Installation Fails on Init with Deployment is not ready: zarf/zarf-docker-registry error #592
Comments
We haven't run into this issue on AKS before, but it looks like the node container runtime is trying to call localhost via https vs http, which is the standard for containerd and crio. Is there any special config or other details about this provisioning that might change the container runtime by chance? Would be helpful to run 'zarf destroy --confirm --remove-components' and then 'zarf init -l=trace'. Sorry if markdown is weird, using the GitHub app right now. |
Thanks for the response @jeff-mccoy. The strange thing is I provisioned a standard AKS cluster(v1.22.6) using the default settings from Azure, so nothing custom. Here's the output from
|
Thanks @erikschlegel this definitely looks like a CRI change on AKS we'll need to play with a bit, I'll spin up AKS again this weekend to see if we can reproduce it. Are you provisioning AKS with IaC or via the Azure web interface? |
I'm provisioning the cluster directly through the Azure Portal. I confirmed that I was able to successfully initialize zarf using K8 version 1.21. I suspect this is a containerd issue as it's configured slightly different on AKS version 1.22+. This PR maybe worth checking out Azure/AgentBaker#1369 |
@jeff-mccoy Any update on this? |
Hi @jeff-mccoy - AKS K8 version 1.21 is no longer available for deployment in the portal and now none of the AKS supported versions appear to work with zarf. Do you happen to have an update? |
Hello, I am experiencing the same issue. Here is my output from kubectl describe pods:
version 21.2 in an Azure US Government AKS cluster |
I can confirm this is still an issue.
|
Yeah they must be doing something special, containerd upstream still serves localhost on http and even tests for it, digging into this more this week: https://github.com/containerd/containerd/blob/main/pkg/cri/server/image_pull_test.go |
While we work on this, as a note on a potential work around, you can also use an external registry as described here: https://docs.zarf.dev/docs/user-guide/the-zarf-cli/cli-commands/zarf_init Under the (you can also specify a separate (also note you will need to be on v0.22.1 or higher) |
Put some notes in a new issue after digging around a bit. Will try to test an older version of AKS later on tonight. |
Also tested on AKS 1.22.11 and seeing the same results. |
Added some new notes at Azure/AKS#3303 (comment). Looks like a bug in containerd that was patched 2 weeks ago. In the interim, Acorn ran into this issue too and did what we've been trying to avoid (modify the containerd config). Root cause is the change @erikschlegel identified to allow containerd registry config overrides in AKS actually highlighted the underlying containerd issue. |
@cheruvu1 if you have any other data you'd like to drop on this issue, please leave it here. Thanks! |
hi folks, as a workaround (that includes patch by iceberg you can update containerd in your cluster. apiVersion: apps/v1
kind: DaemonSet
metadata:
name: update-cluster
labels:
app: update-cluster
spec:
selector:
matchLabels:
app: update-cluster
template:
metadata:
labels:
app: update-cluster
spec:
containers:
- name: update-cluster
image: alpine
imagePullPolicy: IfNotPresent
command:
- nsenter
- --target
- "1"
- --mount
- --uts
- --ipc
- --net
- --pid
- --
- sh
- -c
- |
# apt update and upgrade
export DEBIAN_FRONTEND=noninteractive apt update && apt upgrade -y
sleep infinity
securityContext:
privileged: true
dnsPolicy: ClusterFirst
hostPID: true I was able to initialize zarf: I did also deploy the Big Bang into AKS, but had a "bump" |
Incurred this problem while attempting to initialize zarf on a Nutanix kubernetes cluster running |
EKS v1.23 works without issue because it is still using docker vs. containerd |
Tracking EKS AMI containerd update: awslabs/amazon-eks-ami#1162 |
Upstream issue has been closed and @brianrexrode has successfully tested zarf v0.25.2 with EKS V1.26 |
I've run into this with containerd > 1.6.25. I've been commenting out the following lines in the containerd config.toml to work around it:
|
@AbrohamLincoln thanks for the note! We're looking at exploring other options too, and for others this will affect newer versions of containerd 1.7 (>=1.7.7) (and 2.0 if anyone is on the betas) as well. |
Having this issue also with AKS k8s 1.28.3 version , any updates regarding this?? |
Seeing similar behavior on EKS |
Hi guys. Any plans here? Still having a problem with k8s 1.27+ versions with containerd Any recommendations from the community on how we can "tune" a containerd config to avoid this issue? |
Commenting out the containerd config lines mentioned in this post got things working again for me. |
Also ran into this issue on newer RKE2 versions. It seems linked back to this commit which introduced the Just for reference affected versions of k3s/rke2 appear to be 1.29.1+, 1.28.6+, and 1.27.10+. Definitely curious if there is anything to address this on the zarf side or if this should make its way into the docs as a recommended pre-req/setup for the cluster? |
All things coming around, we may need to look to a way to avoid the localhost/http behavior since containerd has introduced bugs multiple times for this in the past year or so. containerd/containerd#9188 |
A new issue has been opened against containerd to address this: containerd/containerd#10014 |
Does anyone know if this config line fix is in the default zarf init version of k3s?
Error message observed when it fails
|
A fix has been merged in and backported to I have not tested the fix myself yet, but hoping it resolves this issue 🤞 |
Containerd fix has been released in |
Fixes #592 --------- Signed-off-by: schristoff <28318173+schristoff@users.noreply.github.com>
Fixes #592 --------- Signed-off-by: schristoff <28318173+schristoff@users.noreply.github.com> Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Environment
Device and OS: Azure AKS Linux Ubuntu 20.04
App version: 0.19.6
Kubernetes distro being used: AKS Kubernetes V 1.22.6
Other:
Steps to reproduce
zarf init --components git-server
.Expected result
Command succeeds and Zarf is initialized in the cluster.
Actual Result
output of
kubectl -n zarf get events
output of
kubectl -n zarf get all
Severity/Priority
😕 Blocked on deploying zarf packages to Azure AKS
The text was updated successfully, but these errors were encountered: