-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Extremely slow boot with "Build Kairos from scratch" on bare metal #1529
Comments
UpdateI tried a different approach (from https://kairos.io/docs/installation/automated/#iso-remastering): # Prepare KairOS ISO
$ cat <<'EOF' > cloud_init.yaml
#cloud-config
users:
- name: "kairos"
passwd: "kairos"
EOF
$ export IMAGE=quay.io/kairos/core-opensuse-tumbleweed:latest
$ docker pull $IMAGE
$ docker run -v $PWD/cloud_init.yaml:/cloud_init.yaml \
-v $PWD/build:/tmp/auroraboot \
-v /var/run/docker.sock:/var/run/docker.sock \
--rm -ti quay.io/kairos/auroraboot \
--set container_image=docker://$IMAGE \
--set "disable_http_server=true" \
--set "disable_netboot=true" \
--cloud-config /cloud_init.yaml \
--set "state_dir=/tmp/auroraboot"
# Flash to USB
$ fdisk /dev/sdc
g
w
$ sudo dd if=build/iso/kairos.iso of=/dev/sdc bs=4MB It's equally as slow but it doesn't freeze (perhaps because it has a newer kernel?).
|
Update 2If I instead use After completing the installation, rebooting into the disk and selecting KairOS from Grub2, it just shows a black screen for 1m10s before proceeding with the boot process which takes another 1 minute on a 7gb/s SN850 2TB Gen4 NVME SSD 🙁 What is causing KairOS boot to be this slow even after initial installation? |
Hi @GrabbenD ! Are you able to get serial logs of the machine? |
I'd check if it's due to compression of the initramfs. Try disabling it creating a file in
|
just tested this on my baremetal and can reproduce it (with the first comment image) You should edit the grub cmdline and remove the Removing that part from the cmdline causes the initrd to load in a couple of seconds and then it fails to load something nvidia related :D Testing with the update 1 way. |
Thanks for the replies guys!
Yes, @Itxaka
Hope this helps! |
umm yeah, kind of slow userspace:
initrd is expected to be that kind of "slow" as immucore runs 2 stages(rootfs and initramfs) that need to be run in serial before we switch root in order to have everything settled up for the real root, and they expect the environment to be static at that point in order to apply certain configurations, so it can easily eat between 10 to 20 seconds just by the stages. And indeed that its the case, 10 seconds expend on immucore which doesnt run anything else on parallel (sysroot.service is waiting for immucore):
There is also a big gap when switching root which I dont understand:
Thats 15 seconds for doing the switch_root which makes no sense 🤔 everything should be stopped at that point and systemd should have started earlier. Maybe there is some dangling service not killed? Then we got the cos-setup-boot service which unfortunately is reloaded in the middle of its run, so that may also affect it:
And of course, that service prevents We should check first, why is it being restarted, then if we can run it earlier on the process or not make it block |
In my case it looks like its the metadata probing the one failing?? but in the cos-setup-network stage. So there is clearly some kind of mix in there between services because the boot stage was froze until the network stage finished. And the network stage was frozen due to the metadata adquisition, which would make sense, but it should not block others. @GrabbenD can you post your |
Thanks @mudler Dockerfile spoiler# https://github.com/kairos-io/kairos/blob/master/images/Dockerfile.fedora
ARG BASE_IMAGE=fedora:latest
FROM $BASE_IMAGE
RUN echo "install_weak_deps=False" >> /etc/dnf/dnf.conf
RUN dnf install -y "https://zfsonlinux.org/fedora/zfs-release-2-3$(rpm --eval "%{dist}").noarch.rpm" && dnf clean all
RUN dnf install -y \
NetworkManager \
squashfs-tools \
dracut-live \
livecd-tools \
dracut-squash \
dracut-network \
efibootmgr \
audit \
coreutils \
curl \
device-mapper \
dosfstools \
dracut \
dracut-live \
dracut-network \
dracut-squash \
e2fsprogs \
efibootmgr \
gawk \
gdisk \
grub2 \
grub2-efi-x64 \
grub2-efi-x64-modules \
grub2-pc \
haveged \
kernel \
kernel-modules \
kernel-modules-extra \
livecd-tools \
lvm2 \
nano \
NetworkManager \
openssh-server \
parted \
polkit \
rsync \
shim-x64 \
squashfs-tools \
sudo \
systemd \
systemd-networkd \
systemd-resolved \
tar \
which \
kernel kernel-modules kernel-modules-extra \
zfs \
rsync && dnf clean all
RUN mkdir -p /run/lock && \
touch /usr/libexec/.keep && \
systemctl enable getty@tty1.service && \
systemctl enable getty@tty2.service && \
systemctl enable getty@tty3.service && \
systemctl enable systemd-networkd && \
systemctl enable systemd-resolved && \
systemctl enable sshd
# https://github.com/kairos-io/kairos/blob/master/examples/byoi/fedora/Dockerfile
COPY --from=quay.io/kairos/framework:master_fedora / /
# Activate Kairos services
RUN systemctl enable cos-setup-reconcile.timer && \
systemctl enable cos-setup-fs.service && \
systemctl enable cos-setup-boot.service && \
systemctl enable cos-setup-network.service
# https://github.com/kairos-io/kairos/issues/1529#issuecomment-1598520387
RUN sed -i 's/compress=.*/compress="cat"/' /etc/dracut.conf.d/10-immucore.conf
## Generate initrd
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN kernel=$(ls /boot/vmlinuz-* | head -n1) && \
ln -sf "${kernel#/boot/}" /boot/vmlinuz
RUN kernel=$(ls /lib/modules | head -n1) && \
dracut -v -N -f "/boot/initrd-${kernel}" "${kernel}" && \
ln -sf "initrd-${kernel}" /boot/initrd && depmod -a "${kernel}"
RUN rm -rf /boot/initramfs-* With this configuration the difference in boot time was unfortunately less than 1 second. I've also verified that the change is actually present in
Of course @Itxaka Text spoiler$ sudo systemd-analyze critical-chain
multi-user.target @1min 33.136s
└─getty.target @1min 33.136s
└─cos-setup-boot.service @35.490s +57.645s
└─basic.target @35.394s
└─dbus-broker.service @37.096s +186ms
└─dbus.socket @35.207s
└─sysinit.target @34.445s
└─systemd-update-utmp.service @33.641s +704ms
└─auditd.service @32.718s +731ms
└─systemd-tmpfiles-setup.service @31.311s +1.072s
└─local-fs.target @30.905s
└─run-user-0.mount @1min 28.561s
└─local-fs-pre.target @20.276s
└─systemd-tmpfiles-setup-dev.service @20.078s +97ms
└─systemd-sysusers.service @15.510s +4.464s
└─systemd-remount-fs.service @9.927s +3.837s
└─systemd-fsck-root.service @584542y 2w 2d 20h 1min 36.072s +2.476s
└─dracut-pre-mount.service @584542y 2w 2d 20h 1min 35.203s +397ms
└─cryptsetup.target @2.620s
└─systemd-ask-password-wall.path @2.144s
I can't seem to find how to actually change boot parameters in the documentation @Itxaka |
Unfortunately that is a manual step, when presented with the grub menu selection to boot from USB you can press Pretty sure the delay is due to the metadata providers, specifically the https://deploy.equinix.com/developers/docs/metal/server-metadata/metadata/
The rest of the providers have a 2 second timeout which should be more than enough. I already solved this issue in the upstream library but it never trickled down to kairos 🤦 So fix to upstream is: rancher-sandbox/linuxkit@432a87b There is also a workaround for now in which you just need to remove the "packer" provider from the
|
With the patch it looks much better: Still 20 seconds to run, which makes sense as there are 10 providers with a max 2 seconfs timeout, so they are maxing out the timeout (we can ignore the cdrom provider as that one doesn't have a timeout) I think we could still improve this by shipping only the cdrom provider out of the box and letting users set the datasources themselves in they cloud-config @mudler @mauromorales @jimmykarily thoughts? Any reason we need to ship all 10 providers? |
umm I guess if booting from cd and providing those datasources via metadata it makes it impossible to override them for the install properly. Then we need to look into making the datasources faster, maybe via parallel processing or having a faster pre-check? |
Thanks for taking the rime to debug this @Itxaka I'm using KairOS to create my own minimal immutable desktop distro and I'm confused about the cos* services. Do I really need them at boot? What purpose do they have outside of k3s/cloud? |
They are facilities to fully configure the system and modify it via config files, cloud-config style. See the base system has several of those to enable services based on the system boot selection, store things to make immutability work and even generates the bind/ephemeral mounts during boot. They are run at different times during boot and they differ, for example there is the For example, when you use the interactive installer, a config file is generated that on each boot creates your user during the initramfs stage (you can check your own file in any installed system under
You can see all of the base system confis under: https://github.com/kairos-io/kairos/tree/master/overlay/files/system/oem |
Patch has landed on yip to run the datasources in parallel mudler/yip#99 |
We have merged all the fixes, and 2.3.0 is about to be released soon (#1066 ) , closing this issue for now. Please re-open if it's still present in 2.3.0 |
Issue
Following the instructions from https://kairos.io/docs/reference/build-from-scratch/ + https://kairos.io/docs/getting-started/#booting on bare metal leads to a successful installation which shows a BIOS boot entry for the specified USB device.
However, after booting into KairOS and selecting
Kairos (interactive install)
in GRUB2, it takes a extremely long time to load before it freezes.This message is shown for about 1 minute 20 seconds:
Then after another minute the boot process freezes at:
Reproduce
These are copy and pasted instructions from the documentation:
Dockerfile
Commands:
More info
Kairos version:
CPU architecture, OS, and Version:
x86_64, Fedora 36
Expected behavior
Booting Fedora Workstation & Server from the official ISO takes merely seconds and doesn't freeze
Additional context
CPU: AMD Ryzen 9 5950X
GPU: AMD Radeon RX 6800 XT
The text was updated successfully, but these errors were encountered: