CL sometimes does not reboot into the correct partition #2334

ajeddeloh · 2018-01-29T19:34:19Z

Issue Report

Bug

Container Linux Version

Current master, mostly likely all version

Environment

AWS, possibly others

Expected Behavior

When set to reboot into USR-B, it does.

Actual Behavior

Occasionally it reboots into USR-A instead

Reproduction Steps

Run the coreos.update.reboot kola test until it fails on aws (may take a while to fail, recommend doing it a while loop)

Other Information

I modified the test locally to run cgpt show /dev/xvda before and after setting it to boot from USR-B and confirmed it does set the prio/tries/success bits correctly and there is no difference between successful and unsuccessful runs.

The text was updated successfully, but these errors were encountered:

euank · 2018-01-29T19:53:58Z

To replicate some discussion:

update_engine, around 45 seconds after starting, will mark the current partition as good.

The reboot test in question, on boot, does some disk io, marks usr-b, and reboots.

It's possible that if it takes about 45 seconds to do the disk-io/setup stuff, update_engine could overwrite our 'usr-b' mark with its "setgood usr-a" marking, and result in this sort of failure.

That seems like a plausible enough possibility which we'll investigate now. If it's that, then the bug was in the test, not grub or update_engine, so here's to hoping 😌

ajeddeloh · 2018-01-29T22:04:50Z

Looks like that is in fact the case. Masking update-engine seems to fix the problem. Closing since this is mantle/test bug not an OS bug. See mantle fix here: coreos/mantle#801

ajeddeloh added kind/bug component/distro team/os platform/aws labels Jan 29, 2018

ajeddeloh closed this as completed Jan 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CL sometimes does not reboot into the correct partition #2334

CL sometimes does not reboot into the correct partition #2334

ajeddeloh commented Jan 29, 2018

euank commented Jan 29, 2018

ajeddeloh commented Jan 29, 2018

CL sometimes does not reboot into the correct partition #2334

CL sometimes does not reboot into the correct partition #2334

Comments

ajeddeloh commented Jan 29, 2018

Issue Report

Bug

Container Linux Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

euank commented Jan 29, 2018

ajeddeloh commented Jan 29, 2018