Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

CL sometimes does not reboot into the correct partition #2334

Closed
ajeddeloh opened this issue Jan 29, 2018 · 2 comments
Closed

CL sometimes does not reboot into the correct partition #2334

ajeddeloh opened this issue Jan 29, 2018 · 2 comments

Comments

@ajeddeloh
Copy link

Issue Report

Bug

Container Linux Version

Current master, mostly likely all version

Environment

AWS, possibly others

Expected Behavior

When set to reboot into USR-B, it does.

Actual Behavior

Occasionally it reboots into USR-A instead

Reproduction Steps

Run the coreos.update.reboot kola test until it fails on aws (may take a while to fail, recommend doing it a while loop)

Other Information

I modified the test locally to run cgpt show /dev/xvda before and after setting it to boot from USR-B and confirmed it does set the prio/tries/success bits correctly and there is no difference between successful and unsuccessful runs.

@euank
Copy link
Contributor

euank commented Jan 29, 2018

To replicate some discussion:

update_engine, around 45 seconds after starting, will mark the current partition as good.

The reboot test in question, on boot, does some disk io, marks usr-b, and reboots.

It's possible that if it takes about 45 seconds to do the disk-io/setup stuff, update_engine could overwrite our 'usr-b' mark with its "setgood usr-a" marking, and result in this sort of failure.

That seems like a plausible enough possibility which we'll investigate now. If it's that, then the bug was in the test, not grub or update_engine, so here's to hoping 😌

@ajeddeloh
Copy link
Author

Looks like that is in fact the case. Masking update-engine seems to fix the problem. Closing since this is mantle/test bug not an OS bug. See mantle fix here: coreos/mantle#801

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants