Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timed out waiting for device /dev/gpt-auto-root. on aarch64-linux VM (systemd stage 1 + zfs + remote unlocking) #293586

Closed
misuzu opened this issue Mar 5, 2024 · 9 comments · Fixed by #282022

Comments

@misuzu
Copy link
Contributor

misuzu commented Mar 5, 2024

Describe the bug

I've systemd stage 1 + zfs remote unlocking configured on an oracle aarch64 vm.
After decrypting the root dataset it just hangs for 1.5 minutes:

<.....................................................................>
Mar 04 23:00:25 localhost systemd[1]: Finished NixOS Activation.
Mar 04 23:00:53 localhost sshd[599]: Accepted publickey for root from x.x.x.x port 49509 ssh2: ED25519 SHA256:<.............>
Mar 04 23:00:53 localhost sshd[601]: lastlog_openseek: Couldn't stat /var/log/lastlog: No such file or directory
Mar 04 23:00:53 localhost sshd[601]: lastlog_openseek: Couldn't stat /var/log/lastlog: No such file or directory
Mar 04 23:00:53 localhost sshd[599]: syslogin_perform_logout: logout() returned an error
Mar 04 23:00:53 localhost sshd[599]: Received disconnect from x.x.x.x port 49509:11: disconnected by user
Mar 04 23:00:53 localhost sshd[599]: Disconnected from user root x.x.x.x port 49509
Mar 04 23:01:39 localhost systemd[1]: dev-gpt\x2dauto\x2droot.device: Job dev-gpt\x2dauto\x2droot.device/start timed out.
Mar 04 23:01:39 localhost systemd[1]: Timed out waiting for device /dev/gpt-auto-root.
Mar 04 23:01:39 localhost systemd[1]: Dependency failed for Initrd Root Device.
Mar 04 23:01:39 localhost systemd[1]: initrd-root-device.target: Job initrd-root-device.target/start failed with result 'dependency'.
Mar 04 23:01:39 localhost systemd[1]: initrd-root-device.target: Triggering OnFailure= dependencies.
Mar 04 23:01:39 localhost systemd[1]: dev-gpt\x2dauto\x2droot.device: Job dev-gpt\x2dauto\x2droot.device/start failed with result 'timeout'.
Mar 04 23:01:39 localhost systemd[1]: Stopping D-Bus System Message Bus...
Mar 04 23:01:39 localhost systemd[1]: Starting Cleaning Up and Shutting Down Daemons...
Mar 04 23:01:39 localhost systemd[1]: panic-on-fail.service was skipped because no trigger condition checks were met.
Mar 04 23:01:39 localhost systemd[1]: Stopping Dispatch Password Requests to Console...
Mar 04 23:01:39 localhost systemd[1]: dbus.service: Deactivated successfully.
<.....................................................................>

The relevant config:

{
  boot.initrd = {
    systemd = {
      enable = true;
      network.enable = true;
      users.root.shell = "/bin/systemd-tty-ask-password-agent";
    };
    network.enable = true;
    network.ssh = {
      enable = true;
      hostKeys = [ "/etc/initrd/network/ssh/initrd/ssh_host_ed25519_key" ];
    };
  };

  networking.useNetworkd = true;
}

hardware-configuration.nix:

# Do not modify this file!  It was generated by ‘nixos-generate-config’
# and may be overwritten by future invocations.  Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:

{
  imports =
    [ (modulesPath + "/profiles/qemu-guest.nix")
    ];

  boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xhci_pci" "virtio_pci" "sd_mod" ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ ];
  boot.extraModulePackages = [ ];

  fileSystems."/" =
    { device = "mia/safe/root";
      fsType = "zfs";
    };

  fileSystems."/nix" =
    { device = "mia/local/nix";
      fsType = "zfs";
    };

  fileSystems."/tmp" =
    { device = "mia/local/tmp";
      fsType = "zfs";
    };

  fileSystems."/boot" =
    { device = "/dev/disk/by-label/NIXOS_BOOT";
      fsType = "vfat";
    };
}

Not sure if it's the systemd stage 1 itself or the combination of systemd stage 1 + zfs + remote unlocking. I've had no issues with the same config on a x86_64 bare metal machine.

Steps To Reproduce

Steps to reproduce the behavior:

  1. An aarch64 NixOS VM
  2. systemd stage 1 + zfs + remote unlocking (probably only systemd stage 1?)
  3. The hang

Expected behavior

It's weird that it needs /dev/gpt-auto-root out of nowhere

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

% lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   120G  0 disk
├─sda1   8:1    0   512M  0 part /boot
└─sda2   8:2    0 119,5G  0 part
zram0  252:0    0   7,8G  0 disk [SWAP]
% sudo fdisk -l /dev/sda
Disk /dev/sda: 120 GiB, 128849018880 bytes, 251658240 sectors
Disk model: BlockVolume
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 1048576 bytes
Disklabel type: gpt
Disk identifier: 26F8B72D-1D98-DA4E-969E-3673CE4222AA

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   1050623   1048576   512M EFI System
/dev/sda2  1050624 251658206 250607583 119.5G Linux filesystem

Notify maintainers

@NixOS/systemd @nikstur

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"aarch64-linux"`
 - host os: `Linux 6.6.19, NixOS, 24.05 (Uakari), 24.05.20240303.b8697e5`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - nixpkgs: `/etc/nix/channels/nixpkgs`

Add a 👍 reaction to issues you find important.

@surfaceflinger
Copy link
Member

Same thing for me on my x86-64 PC since bumping flake yesterday. I don't have networking in initrd though.

@ElvishJerricco
Copy link
Contributor

So what's weird here is that the dependency on /dev/gpt-auto-root shouldn't be created by systemd-gpt-auto-generator if you don't have the efivarfs kernel module available in your initrd. Now, if you do have efivarfs in your initrd, then this is certainly a bug that I think I can explain; but I'm not sure why you would. We do have a pr (#282022) that ought to fix this though, since it configures root= on the cmdline so that systemd-gpt-auto-generator only does the right thing instead of just assuming it should run when it shouldn't. But again, this shouldn't be necessary because it's very unusual to have efivarfs in your initrd.

@misuzu
Copy link
Contributor Author

misuzu commented Mar 8, 2024

Now, if you do have efivarfs in your initrd, then this is certainly a bug that I think I can explain; but I'm not sure why you would.

It doesn't seem like it, though maybe I'm looking in the wrong places:

nix-repl> config.boot.initrd.availableKernelModules
[ "virtio_net" "virtio_pci" "virtio_mmio" "virtio_blk" "virtio_scsi" "9p" "9pnet_virtio" "ata_piix" "uhci_hcd" "xhci_pci" "virtio_pci" "sd_mod" "md_mod" "raid0" "raid1" "raid10" "raid456" "autofs" "tpm-tis" "tpm-crb" "ahci" "sata_nv" "sata_via" "sata_sis" "sata_uli" "ata_piix" "pata_marvell" "nvme" "sd_mod" "sr_mod" "mmc_block" "uhci_hcd" "ehci_hcd" "ehci_pci" "ohci_hcd" "ohci_pci" "xhci_hcd" "xhci_pci" "usbhid" "hid_generic" "hid_lenovo" "hid_apple" "hid_roccat" "hid_logitech_hidpp" "hid_logitech_dj" "hid_microsoft" "hid_cherry" ]

nix-repl> config.boot.initrd.kernelModules
[ "virtio_balloon" "virtio_console" "virtio_rng" "zfs" "spl" "af_packet" "dm_mod" "af_packet" ]

nix-repl> config.boot.kernelModules
[ "bridge" "macvlan" "tap" "tun" "zfs" "loop" "atkbd" "tls" ]

lsmod | grep efi doesn't show anything either

@misuzu
Copy link
Contributor Author

misuzu commented Mar 8, 2024

We do have a pr (#282022) that ought to fix this though, since it configures root= on the cmdline so that systemd-gpt-auto-generator only does the right thing instead of just assuming it should run when it shouldn't.

I've tested #282022 on top of 9df3e30 and it seems to fix the issue

@ElvishJerricco
Copy link
Contributor

@misuzu What kernel are you using? I don't think there's any nixos kernel that does CONFIG_EFIVAR_FS=y, but that would be one explanation.

@surfaceflinger
Copy link
Member

surfaceflinger commented Mar 10, 2024

From taking a look at misuzu's profile I believe we're both using cachyos kernel from nyx which does CONFIG_EFIVAR_FS=y 🤧
but at the same point I don't think this kernel supports aarch64?

@misuzu
Copy link
Contributor Author

misuzu commented Mar 10, 2024

What kernel are you using?

I'm using the default kernel (pkgs.linuxPackages)

I don't think there's any nixos kernel that does CONFIG_EFIVAR_FS=y, but that would be one explanation.

Looks like the default kernel actually do have this:

% zgrep EFIVAR /proc/config.gz
CONFIG_EFIVAR_FS=y

This is probably the reason it gets enabled: https://github.com/torvalds/linux/blob/005f6f34bd47eaa61d939a2727fc648e687b84c1/arch/arm64/configs/defconfig#L1590

@nikstur
Copy link
Contributor

nikstur commented Mar 10, 2024

I've encountered this issue on a much simpler setup: aarch64 with a tmpfs / mount configured in /etc/fstab via fileSystems. It as fixed by explitily providing root=fstab on the kernel cmdline.

@ElvishJerricco
Copy link
Contributor

Ok this is just all the more reason to get #282022 merged :) That should fix this issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

4 participants