-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zincati service fails to start if non-conforming ostree deployment exists #859
Comments
Thanks for the report and the lengthy reproducer. I do agree this is an unexpected behavior, Zincati should be able to proceed based on the metadata on the booted deployment. Looking into the inner logic, I think you are actually hitting a bug in rpm-ostree which is not properly handing the combination of
Thus Zincati sees two deployments and tries to deserialize both of them. The "is-booted" filtering would happen after that, but the containerized deployment cannot be parsed as it is missing some of the required fields. Searching through the bug tracker, this has already been reported in the past: coreos/rpm-ostree#2829. |
Thanks for looking into it @lucab - I guess we should try to prioritize that rpm-ostree issue if we want to fix this problem. |
Though ideally it seems like the metadata Zincati is looking for should be retained even in the container case. |
Indeed that is another thing that comes out of this that we should fix. |
I do agree with all the statements above. Additionally, I think it would be good to improve Zincati logic anyway to make it slightly more resilient against this kind of local unexpected state. Right now we do expect all of the following fields: zincati/src/rpm_ostree/cli_status.rs Lines 46 to 68 in b8cf2ed
But we could push these requirements a bit further down in the flow. |
I did #876 related to this. (Also coreos/rpm-ostree#2829 is now fixed) |
Is there a mitigation/workaround for this problem while we wait for a release with fixes? |
I think my workaround at the time was to clean up the non booted deployment. I can't remember if it was |
Ah. In my case I intend to stay on an OCI image, so there are no deployments to remove. Zincati is just broken on these systems. |
Yes and it won't work at all until coreos/fedora-coreos-tracker#1263 is implemented so you can just disable |
That's what I figured, thanks for the confirmation! |
The previous work resulted in an error, but let's just exit with status zero because otherwise we end up tripping up things like checks for failing units. Anyone who has rebased into a container has very clearly taken explicit control over the wheel and there's no point in us erroring out. Closes: coreos#859
This one has been fixed for a while, I believe all the way back by #876 |
I appreciate that this issue is closed, but i'm experiencing an issue that looks similar, and was hoping to get confirmation if this functionality is expected to work for custom container-images. In the previous comments, I believe @dustymabe says the functionality is broken while @cgwalters suggests it might be working. Config:
Error: Mar 25 13:44:09 mb01 systemd[1]: Starting zincati.service - Zincati Update Agent...
Mar 25 13:44:09 mb01 zincati[11851]: [INFO zincati::cli::agent] starting update agent (zincati 0.0.27)
Mar 25 13:44:09 mb01 zincati[11851]: [ERROR zincati] error: failed to assemble configuration settings
Mar 25 13:44:09 mb01 zincati[11851]: [ERROR zincati] -> failed to validate agent identity configuration
Mar 25 13:44:09 mb01 zincati[11851]: [ERROR zincati] -> failed to build default identity
Mar 25 13:44:09 mb01 zincati[11851]: [ERROR zincati] -> failed to introspect booted OS image
Mar 25 13:44:09 mb01 zincati[11851]: [ERROR zincati] -> Automatic updates disabled; booted into container image ostree-unverified-regi>
Mar 25 13:44:09 mediabarn systemd[1]: Started zincati.service - Zincati Update Agent.
Mar 25 13:44:09 mediabarn systemd[1]: zincati.service: Deactivated successfully |
Right now if you rebase to a custom container image then you now own the updates. i.e. you need to push a new built container to the registry/repo and either manually do the update ( In this case it would be best to just |
Got it thank you! A timer will work for my immediate requirement, but I have another use case for fleet-lock, so I'll keep an eye on: coreos/fedora-coreos-tracker#1263 |
Bug Report
I recently was experimenting with the
quay.io/fedora/fedora-coreos:next-devel
container on one of my systems. I rebased it using the following command and left it like that for a few releases (manually runningrpm-ostree upgrade
each cycle).I then decided to go back to having automatic updates working (zincati + OSTree repo) so I rebased back to what I had to begin with:
and ended up with:
However I noticed
zincati
fails to start now:This is because there is apparently some data missing about the container deployment that causes
zincati
to barf and not continue best effort. I would assume that since the booted deployment is goodzincati
should be able to continue.After cleaning up the
rollback
deployment withsudo rpm-ostree rollback -r
zincati
was able to start.Environment
QEMU - FCOS at
37.20221003.1.0
Expected Behavior
Able to start
zincati
.Actual Behavior
zincati
fails to start up.Reproduction Steps
sudo rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/fedora/fedora-coreos:next
and rebootsudo rpm-ostree rebase fedora:fedora/x86_64/coreos/next
and rebootzincati.service
should fail to start now because rollback deployment doesn't conform.Other Information
The text was updated successfully, but these errors were encountered: