Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a disk to an instance changed the boot order #5112

Open
citrus-it opened this issue Feb 21, 2024 · 6 comments
Open

Adding a disk to an instance changed the boot order #5112

citrus-it opened this issue Feb 21, 2024 · 6 comments
Labels
customer For any bug reports or feature requests tied to customer requests known issue To include in customer documentation and training
Milestone

Comments

@citrus-it
Copy link
Contributor

I added a disk to an instance that has been running in the colo for a while, and it failed to boot afterwards, dropping to the UEFI shell. I've replicate this with a fresh instance and the rest of this note is from that replication case.

One notable thing about the VM that I originally saw the problem with is that its two disks were in slots 1 and 2, with nothing present in slot 0. This is likely because it was created before the fix for #5067 was merged.

To replicate the failure, I created a new disk from an image, and then two additional blank ones. By attaching them to a new instance in the right order, then detaching a blank disk again, I was able to end up with an instance in the same configuration, with the boot disk in slot 1 and slot 0 being empty.

                 name                | slot
-------------------------------------+-------
  test-omnios-bloody-20240215-e87155 |    1
  blank2                             |    2

I then booted this instance, which was successful, and mounted the EFI System Partition (ESP) to fish out the NvVars file which is where the UEFI bootrom stores its persistent variables. Decoding this shows that the bootrom has enumerated all of the possible boot devices, assigned them numbers and configured an initial boot order:

Variable        Value                    Notes
--------        -----                    ------
Boot0000        UIApp
Boot0001        UEFI                    <-- slot 1
Boot0002        UEFI 2                  <-- slot 2
Boot0003        UEFI Non-Block Device   <-- slot 8 (cidata volume)
Boot0004        UEFI PXE v4
Boot0005        EFI Internal Shell
BootOrder       0, 1, 2, 3, 4, 5

So far so good. I rebooted the instance a couple of times to confirm that it booted normally, and that these variables didn't change.

I then shut down the instance and attached a new blank disk to it. This disk was 128G in size and used a 4096 sector size. After this, the database showed that the new disk has been placed in slot 0. This mirrors what happened with the previously failed instance.

                 name                | slot
-------------------------------------+-------
  test-omnios-bloody-20240215-e87155 |    1
  blank4096                          |    0
  blank2                             |    2

On booting the instance back up, it dropped to the EFI shell after failing to boot from Boot0003 and via PXE:

BdsDxe: failed to load Boot0003 "UEFI Non-Block Boot Device" from PciRoot(0x0)/Pci(0x18,0x0): Not Found

>>Start PXE over IPv4.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0004 "UEFI PXEv4 (MAC:A84025FDD042)" from PciRoot(0x0)/Pci(0x9,0x0)/MAC(A84025FDD042,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found
BdsDxe: loading Boot0005 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
BdsDxe: starting Boot0005 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Shell>

Using the EFI shell to look at the persistent variables now showed something interesting:

Boot0000        UIApp
Boot0001        UEFI                    <-- slot 1
Boot0002        UEFI 2                  <-- slot 2
Boot0003        UEFI Non-Block Device   <-- slot 8 (cidata volume)
Boot0004        UEFI PXE v4
Boot0005        EFI Internal Shell
Boot0006        UEFI 3                  <-- slot 0 (newly added drive)
Boot Order      0, 3, 4, 5, 1, 2, 6

The new disk has been enumerated and added as Boot 0006, which is not a surprise, but the boot order has been changed so that all three NVMe disks are now at the end. This explains why the instance attempted to boot from Boot0003, which is the cidata volume, and failed, then tried PXE boot and finally dropped to the EFI shell.

The bootrom's debug output from this boot also shows this same strange boot order:

[Bds]=============Begin Load Options Dumping ...=============
  Driver Options:
  SysPrep Options:
  Boot Options:
    Boot0000: UiApp              0x0109
    Boot0003: UEFI Non-Block Boot Device                 0x0001
    Boot0004: UEFI PXEv4 (MAC:A84025FAF1FF)              0x0001
    Boot0005: EFI Internal Shell                 0x0001
    Boot0001: UEFI               0x0001
    Boot0002: UEFI  2            0x0001
    Boot0006: UEFI  3            0x0001
  PlatformRecovery Options:
    PlatformRecovery0000: Default PlatformRecovery               0x0001
[Bds]=============End Load Options Dumping=============

To replicate this I faithfully reproduced what happened in the colo -- not all of the steps here may be necessary to trigger it, more experimentation is necessary.

@gjcolombo
Copy link
Contributor

I was able to repro with a bare Propolis server. I have a hunch as to what's happening:

  • The guest bootrom's platform boot manager initialization code calls EDK2's EfiBootManagerRefreshAllBootOption during VM startup.
  • This calls BmEnumerateBootOptions, which looks for block I/O devices and FAT file systems that could be viable boot options.
  • These automatically-generated options get assigned a description, which delegates to the NVME-flavored description function to generate a description.
    • This is supposed to insert serial number and model number information if that's available, but Propolis returns 0s for both of these fields in the output to an NVMe Identify Controller command. I think the 2 and 3 are getting added by some other disambiguation code that runs after this (some evidence for this below).

I suspect that what's happening is that when the new drive is added in slot 0, the descriptions are shifting around: before, slot 1 was labeled UEFI and slot 2 was labeled UEFI 2; after, slot 0 gets the UEFI label and the other two slots shift by one.

The reason this matters is that once EfiBootManagerRefreshAllBootOption loads the boot options, it goes through this code block, which removes "invalid" boot options from the nonvolatile boot order. An option is "invalid" if EfiBootManagerFindLoadOption doesn't find a match for it in the boot options loaded from the nonvolatile variables. One of the reasons this can happen is a mismatch in the searched-for Description or FilePath for the option under consideration. After these entries are pruned, the refresh function calls EfiBootManagerAddLoadOptionVariable to add back all the options that aren't currently accounted for.

I think this would explain what's being seen above: when the new disk gets added, the existing entries for the disks in slots 1 and 2 end up mismatching either on their file paths or descriptions, so they get trimmed; then all three disks get added back at the end of the boot order, where they happen to be after the UEFI shell entry, which makes the instance boot to the shell.


To see this in action, I added some debug prints to the bootrom to show how these comparisons proceed on a functional and non-functional boot sequence. Here's what I get when I boot a VM with a working boot disk attached at 0.17.0:

  Assigned description  via handler 4                                                                                                                                                                                                                                                                                        
  Assigned description PXEv4 (MAC:020820138D79) via handler 2                                                                                                                                                                                                                                                                
  EfiBootManagerFindLoadOption: UEFI                                                                                                                                                                                                                                                                                         
  EfiBootManagerFindLoadOption: candidate 0 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                                
  EfiBootManagerFindLoadOption: candidate 0 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Description UEFI PXEv4 (MAC:020820138D79) != UEFI                                                                                                                                                                                                                                                             
  EfiBootManagerFindLoadOption: candidate 1 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: UEFI                                                                                                                                                                                                                                                                                         
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                                
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Description UEFI PXEv4 (MAC:020820138D79) != UEFI                                                                                                                                                                                                                                                             
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: EFI Internal Shell                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Description EFI Internal Shell != UEFI                                                                                                                                                                                                                                                                        
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Key->Description EFI Internal Shell != UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                               
  EfiBootManagerFindLoadOption: candidate 3 (EFI Internal Shell)                                                                                                                                                                                                                                                             
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  Select Item: 0x19                                                                                                                                                                                                                                                                                                          
  [Bds]OsIndication: 0000000000000000                                                                                                                                                                                                                                                                                        
  [Bds]=============Begin Load Options Dumping ...=============                                                                                                                                                                                                                                                              
    Driver Options:                                                                                                                                                                                                                                                                                                          
    SysPrep Options:                                                                                                                                                                                                                                                                                                         
    Boot Options:                                                                                                                                                                                                                                                                                                            
      Boot0000: UiApp          0x0109                                                                                                                                                                                                                                                                                        
      Boot0001: UEFI           0x0001                                                                                                                                                                                                                                                                                        
      Boot0002: UEFI PXEv4 (MAC:020820138D79)          0x0001                                                                                                                                                                                                                                                                
      Boot0003: EFI Internal Shell         0x0001                                                                                                                                                                                                                                                                            
    PlatformRecovery Options:                                                                                                                                                                                                                                                                                                
      PlatformRecovery0000: Default PlatformRecovery       0x0001                                                                                                                                                                                                                                                            
  [Bds]=============End Load Options Dumping============= 

(Handler 4 is the NVMe device description handler.)

If I now attach a blank disk at 0.16.0 I get the following:

  Assigned description  via handler 4                                                                                                                                                                                                                                                                                        
  Assigned description  via handler 4                                                                                                                                                                                                                                                                                        
  Assigned description PXEv4 (MAC:020820138D79) via handler 2                                                                                                                                                                                                                                                                
  EfiBootManagerFindLoadOption: UEFI                                                                                                                                                                                                                                                                                         
  EfiBootManagerFindLoadOption: candidate 0 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: FilePath mismatch                                                                                                                                                                                                                                                                                                  
  EfiBootManagerFindLoadOption: candidate 1 (UEFI  2)                                                                                                                                                                                                                                                                        
  EBMFLO: Key->Description UEFI  != UEFI  2                                                                                                                                                                                                                                                                                  
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Key->Description UEFI  != UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                            
  EmuVariablesUpdatedCallback                                                                                                                                                                                                                                                                                                
  FSOpen: Open 'NvVars' Success                                                                                                                                                                                                                                                                                              
  Saved NV Variables to NvVars file                                                                                                                                                                                                                                                                                          
  EmuVariablesUpdatedCallback                                                                                                                                                                                                                                                                                                
  FSOpen: Open 'NvVars' Success                                                                                                                                                                                                                                                                                              
  Saved NV Variables to NvVars file                                                                                                                                                                                                                                                                                          
  EfiBootManagerFindLoadOption: UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                                
  EfiBootManagerFindLoadOption: candidate 0 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Description UEFI PXEv4 (MAC:020820138D79) != UEFI                                                                                                                                                                                                                                                             
  EfiBootManagerFindLoadOption: candidate 1 (UEFI  2)                                                                                                                                                                                                                                                                        
  EBMFLO: Key->Description UEFI PXEv4 (MAC:020820138D79) != UEFI  2                                                                                                                                                                                                                                                          
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: UEFI                                                                                                                                                                                                                                                                                         
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: FilePath mismatch                                                                                                                                                                                                                                                                                                  
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Key->Description UEFI  != UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                            
  EfiBootManagerFindLoadOption: candidate 3 (EFI Internal Shell)                                                                                                                                                                                                                                                             
  EBMFLO: Key->Description UEFI  != EFI Internal Shell                                                                                                                                                                                                                                                                       
  EmuVariablesUpdatedCallback                                                                                                                                                                                                                                                                                                
  FSOpen: Open 'NvVars' Success                                                                                                                                                                                                                                                                                              
  Saved NV Variables to NvVars file                                                                                                                                                                                                                                                                                          
  EmuVariablesUpdatedCallback                                                                                                                                                                                                                                                                                                
  FSOpen: Open 'NvVars' Success                                                                                                                                                                                                                                                                                              
  Saved NV Variables to NvVars file                                                                                                                                                                                                                                                                                          
  EfiBootManagerFindLoadOption: UEFI  2                                                                                                                                                                                                                                                                                      
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Description UEFI  2 != UEFI                                                                                                                                                                                                                                                                                   
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Key->Description UEFI  2 != UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                          
  EfiBootManagerFindLoadOption: candidate 3 (EFI Internal Shell)                                                                                                                                                                                                                                                             
  EBMFLO: Key->Description UEFI  2 != EFI Internal Shell                                                                                                                                                                                                                                                                     
  EmuVariablesUpdatedCallback                                                                                                                                                                                                                                                                                                
  FSOpen: Open 'NvVars' Success                                                                                                                                                                                                                                                                                              
  Saved NV Variables to NvVars file                                                                                                                                                                                                                                                                                          
  EmuVariablesUpdatedCallback                                                                                                                                                                                                                                                                                                
  FSOpen: Open 'NvVars' Success                                                                                                                                                                                                                                                                                              
  Saved NV Variables to NvVars file                                                                                                                                                                                                                                                                                          
  EfiBootManagerFindLoadOption: UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                                                
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI )                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Description UEFI PXEv4 (MAC:020820138D79) != UEFI                                                                                                                                                                                                                                                             
  EfiBootManagerFindLoadOption: candidate 2 (UEFI PXEv4 (MAC:020820138D79)) 
EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: EFI Internal Shell                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 0 (UiApp)                                                                                                                                                                                                                                                                          
  EBMFLO: Key->Attributes 1 != 265                                                                                                                                                                                                                                                                                           
  EfiBootManagerFindLoadOption: candidate 1 (UEFI PXEv4 (MAC:020820138D79))                                                                                                                                                                                                                                                  
  EBMFLO: Key->Description EFI Internal Shell != UEFI PXEv4 (MAC:020820138D79)                                                                                                                                                                                                                                               
  EfiBootManagerFindLoadOption: candidate 2 (EFI Internal Shell)                                                                                                                                                                                                                                                             
  EBMFLO: Matched!                                                                                                                                                                                                                                                                                                           
  Select Item: 0x19                                                                                                                                                                                                                                                                                                          
  [Bds]OsIndication: 0000000000000000                                                                                                                                                                                                                                                                                        
  [Bds]=============Begin Load Options Dumping ...=============                                                                                                                                                                                                                                                              
    Driver Options:                                                                                                                                                                                                                                                                                                          
    SysPrep Options:                                                                                                                                                                                                                                                                                                         
    Boot Options:                                                                                                                                                                                                                                                                                                            
      Boot0000: UiApp          0x0109                                                                                                                                                                                                                                                                                        
      Boot0002: UEFI PXEv4 (MAC:020820138D79)          0x0001                                                                                                                                                                                                                                                                
      Boot0003: EFI Internal Shell         0x0001                                                                                                                                                                                                                                                                            
      Boot0001: UEFI           0x0001                                                                                                                                                                                                                                                                                        
      Boot0004: UEFI  2        0x0001                                                                                                                                                                                                                                                                                        
    PlatformRecovery Options:                                                                                                                                                                                                                                                                                                
      PlatformRecovery0000: Default PlatformRecovery       0x0001                                                                                                                                                                                                                                                            
  [Bds]=============End Load Options Dumping=============

Notice that the "blank" description gets applied by the NVMe handler twice (so the disambiguating 2 probably came from elsewhere). Then we see that the UEFI entry ends up being discarded due to a FilePath mismatch, and the UEFI 2 entry ends up not matching any of the existing descriptions in the nonvolatile variables, so it also gets discarded.

I think is consistent with the following events:

  1. System boots for the first time with just the boot disk in 0.17.0; there are no boot options to load from the nonvolatile variables since this is the first boot
  2. EDK2 enumerates a boot entry for 0.17.0 with description "UEFI" and a file path pointing to the boot application on that disk
  3. This entry gets added to the nonvolatile variables on the disk at 0.17.0
  4. System shuts down; configuration changes to include a blank disk at 0.16.0
  5. EDK2 loads the BootOrder nonvolatile variables from 0.17.0
  6. EDK2 enumerates the disk at 0.16.0, assigns it description "UEFI", and decides it has no boot application (makes sense since the disk is blank and has no ESP)
  7. EDK2 enumerates the disk at 0.17.0, assigns it description "UEFI 2", and finds its boot application
  8. EDK2 tries to match the "UEFI" entry in NV storage to one of the enumerated entries; this fails because the enumerated entry from step 6 doesn't have a file path
  9. EDK2 prunes the "UEFI" entry from the boot order
  10. EDK2 adds back the newly-enumerated entries for 0.16.0 and 0.17.0, but now they're at the end of the boot order

The main thing I think I'm missing at this point is tracing that conclusively demonstrates that the NVMe boot options are getting added/described in PCI slot order--I think the logs above show a lot of smoke, but I'd really like to see the fire.

@gjcolombo
Copy link
Contributor

gjcolombo commented Feb 21, 2024

The main thing I think I'm missing at this point is tracing that conclusively demonstrates that the NVMe boot options are getting added/described in PCI slot order--I think the logs above show a lot of smoke, but I'd really like to see the fire.

The disambiguating integers get added in BmMakeBootOptionDescriptionUnique, which visits the boot options in the order they were enumerated and adds disambiguating numbers to any options whose descriptions were already used elsewhere.

This fits with the behavior described above provided BmEnumerateBootOptions visits NVMe devices in PCI slot order. That enumeration is handled by the EFI_LOCATE_HANDLE_BUFFER function in the EFI boot services table; assuming I have the right implementation of that function, it will search the global handle list in the order that protocols were registered by calls to the boot services' EFI_INSTALL_PROTOCOL_INTERFACE function. I'll need to do some more reading to figure out when these registrations happen for ESPs on NVMe devices. I'm guessing they're visited in slot order (by the boot device selection code, specifically VisitAllPciInstances and its callees) but haven't walked through the whole callee tree to be sure.

@askfongjojo askfongjojo added the known issue To include in customer documentation and training label Mar 9, 2024
@askfongjojo askfongjojo added this to the 8 milestone Mar 9, 2024
@morlandi7 morlandi7 modified the milestones: 8, 9 Apr 25, 2024
@morlandi7 morlandi7 modified the milestones: 9, 10 Jun 27, 2024
@askfongjojo askfongjojo modified the milestones: 10, 11 Aug 22, 2024
@gjcolombo gjcolombo assigned iximeow and unassigned pfmooney Aug 22, 2024
@gjcolombo
Copy link
Contributor

Reassigning per the discussion at the 22 Aug 2024 hypervisor huddle.

@iximeow
Copy link
Member

iximeow commented Oct 8, 2024

with #6585 and oxidecomputer/console#2464 landed there is at least a way to work around the problem that can occur here, and instance pages in the UI will heavily guide towards having a boot disk.

one important outstanding question here is why are we in a situation where boot options are so unstable?

devices as i'm seeing them today in UEFI boot options end up named something like UEFI Misc Device or UEFI Misc Device 2 and so on. the "Misc Device" part of this name comes from BmGetMiscDescription, which is the backstop handler to get a device description after several options are tried. the number suffixes track with Greg's mention of BmMakeBootOptionDescriptionUnique, which appends incrementing integers to boot options to give them distinct names. the UEFI prefix (note space is included) comes from postprocessing a device description here in BmGetBootDescription.

reiterating what we know: if a low-PCI-number disk is detached, when disk names are influenced by BmMakeBootOptionDescriptionUnique, all disks after it will have a new description on boot. that description won't match whatever it was previously, so extant boot options for those devices are "invalid".

there is no way to leave BmGetMiscDescription with a name of "", so BmGetNvmeDescription (or other handler) must have been returning a description of "" which would then get UEFI prepended for the names as-reported. between then and now, it looks like BmGetNvmeDescription started returning NULL, causing us to get to BmGetMiscDescription which yields a description Misc Device.

so i think in an ideal world, we'd be seeing NVMe devices named according to their reported model and serial numbers. those names are much more stable than "wherever it happens to be in PCI device order", and adding/removing disks will stop renaming later devices, and stop invalidating their boot options. BmGetNvmeDescription should be doing this.


we're definitely providing a disk serial number currently, and as i can see in propolis-server, a null model number is expected. from an Ubuntu instance today:

ubuntu@image-builder:~$ nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            image-builder-14e362 (null)                                   0x1         21.47  GB /  21.47  GB    512   B +  0 B   (null)  

i can't obviously see a reason we might have stopped seeing descriptions from BmGetNvmeDescription here, nor why we weren't seeing the device's serial number in descriptions when it was providing descriptions. AFAICT in this VM i should see a boot option named like UEFI image-builder-14e362.

@iximeow
Copy link
Member

iximeow commented Oct 10, 2024

"UEFI Misc Device" rather than "UEFI " is because i was comparing against a VM i'd run locally with propolis-standalone. "Misc Device" ends up being the EDK2 categorization of a virtio block device. if i'd checked efibootmgr in the above-mentioned Ubuntu instance, i would have seen

ubuntu@image-builder:~$ efibootmgr
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000,0003
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI 	PciRoot(0x0)/Pci(0x10,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00){auto_created_boot_option}
Boot0002* Ubuntu	HD(15,GPT,cd261615-18d7-4d4d-bec6-bd2bfecdf6d7,0x2800,0x35000)/File(\EFI\ubuntu\shimx64.efi)
Boot0003* EFI Internal Shell	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1

where Boot0001 is UEFI , same as before.

the name UEFI , in turn, is because of a mix of EDK2 misbehavior and our providing nulls for the Mn field. since we provide an array like [0; 40] for the model number field, this loop inserts 40 nulls into the description string. then the serial number, which we do provide, is inserted into the description. for real instances, the serial number is derived from the disk nme and is something like the above nvme list output shows.

at some point after this description is constructed the string appears to be StrLen'd, and nulls from the model number cause the string to get cut short. EDK2 probably should insert spaces rather than nulls if the IDENTIFY strings have nulls in them. BmEliminateExtraSpaces deduplicates runs of spaces anywhere in the string, so a string like UEFI modelmodel diskname ends up like UEFI moelmodel diskname. pretty reasonable!

with a patched propolis-standalone you can see some of this with the normal OVMF_BUILD.fd. given a block device like

[dev.block0]
driver = "pci-nvme"
block_dev = "boot"
pci-path = "0.4.0"

changing IdentifyController to have a fully-populated mn, like mn: ['q': 40], makes a block pci-nvme description change from UEFI to UEFI qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq boot. the serial number has been there all along!

with a default OVMF_BUILD.fd an mn with less data and padded with nulls demonstrates the truncation we've inadvertently been getting. names like UEFI qqqq abound. finally, inserting spaces rather than copying the NULLs faithfully gets us boot option descriptions like UEFI qqqqqqqq boot. yay!

so, the exciting discovery here is probably that with current OVMF builds, providing model numbers will render some instances unbootable. if we provide a model number, that will change boot option descriptions and in some cases probably kick a real boot disk after the EFI shell same as in the original observation.

@iximeow
Copy link
Member

iximeow commented Oct 10, 2024

now that we can specify boot disks it's possible to unwedge an instance that gets in this state: specify a boot disk, boot the instance to that disk, and its UEFI variables will reflect that disk as the boot option if the boot disk is unset again later. or leave the intended guest OS disk as the boot disk in perpetuity!

the above issues and test help make sure we don't unwittingly afflict VMs with this issue if it did not have a boot disk set. if we can get to them, it'll make it much more difficult to get to this wedged state, as well. i think this is about as good of a place as i can get this right now.

@iximeow iximeow removed their assignment Oct 10, 2024
@iximeow iximeow modified the milestones: 11, Unscheduled Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer For any bug reports or feature requests tied to customer requests known issue To include in customer documentation and training
Projects
None yet
Development

No branches or pull requests

7 participants