-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sysext: port AWS OEM to systemd sysext image #1083
Conversation
Build action triggered: https://github.com/flatcar/scripts/actions/runs/6297834375 |
cat > "${rootfs}/usr/lib/systemd/system/setup-oem.service" <<-'EOF' | ||
[Unit] | ||
Description=Setup OEM | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this run before amazon-ssm-agent.service?
Also, would symlinks work, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: what would be the benefit of using symlink here? If the user wants to edit /etc/amazon/ssm/amazon-ssm-agent.json
for example, he won't be able to do it as /usr/...
is read-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With cp
the user edits also get lost. With a symlink we could check whether a custom target is set and then don't touch it and document how that works to opt-out of auto-updates for that file.
|
EOF | ||
|
||
mkdir -p "${rootfs}/usr/lib/systemd/system/multi-user.target.d" | ||
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service coreos-metadata-sshkeys@.service setup-oem.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ami.conf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling coreos-metadata-sshkeys@.service
is something we should do in the base image.
[Service] | ||
Type=oneshot | ||
RemainAfterExit=yes | ||
ExecStartPre=/usr/bin/cp /usr/share/amazon/ssm/amazon-ssm-agent.json /etc/amazon/ssm/amazon-ssm-agent.json.template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that file is only used by the service unit we could also use BindPaths=
to provide it under /etc
. @krnowak that could also be an option for the waagent, or?
/etc/eks/bootstrap.sh | ||
) | ||
|
||
rm -rf "${to_delete[@]/#/${rootfs}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this line, what creates the files under /etc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've been confused about the manglefs
script - to me it was running on the host so I wanted to clean up the OEM old files from there.
EDIT: Ok, it's done there flatcar/update_engine#24 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of old OEM files should now go to the misc-files package: #1016
We should boot an instance and check the old contents of /oem/
(the list of files for /etc
looks good and can be easily seen in the base.ign
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the above can be deleted here, or?
mkdir -p "${rootfs}/usr/lib/systemd/system/amazon-ssm-agent.service.d" | ||
cat > "${rootfs}/usr/lib/systemd/system/amazon-ssm-agent.service.d/10-bindpaths.conf" <<-'EOF' | ||
[Service] | ||
BindPaths=/usr/share/amazon/ssm/:/etc/amazon/ssm/ /usr/share/amazon/eks/boostrap.sh:/etc/eks/bootstrap.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are users expected to be able run the CLI themselves and does it also read from /etc
? (In that case we would anyway have to have the symlinks from /etc
, or?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually from my understanding, the bootstrap script is executed by user-data (https://kinvolk.io/blog/2021/02/deploying-an-eks-cluster-with-flatcar-workers/) - so it does not even need to be shared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was more about ssm-cli
and whether this is used by users and needs access to the files in /etc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that ssm-cli
is consuming directly the amazon-ssm-agent
but I think it might be wiser to copy directly the files in /etc
rather that bind mount.
EOF | ||
|
||
mkdir -p "${rootfs}/usr/lib/systemd/system/multi-user.target.d" | ||
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service coreos-metadata-sshkeys@core.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the starting of coreos-metadata-sshkeys@core.service
in the base image - we also do the coreos-cloudinit start in the base image and could do it similarly (have a unit that has a condition for the the OEM kernel cmdline argument and then uses Upholds=
to start it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this
[Unit]
ConditionKernelCommandLine=|ignition.platform.id=packet
ConditionKernelCommandLine=|flatcar.oem.id=packet
ConditionKernelCommandLine=|coreos.oem.id=packet
ConditionKernelCommandLine=|ignition.platform.id=ec2
ConditionKernelCommandLine=|flatcar.oem.id=ec2
ConditionKernelCommandLine=|coreos.oem.id=ec2
ConditionKernelCommandLine=|ignition.platform.id=digitalocean
ConditionKernelCommandLine=|flatcar.oem.id=digitalocean
ConditionKernelCommandLine=|coreos.oem.id=digitalocean
ConditionKernelCommandLine=|ignition.platform.id=gce
ConditionKernelCommandLine=|flatcar.oem.id=gce
ConditionKernelCommandLine=|coreos.oem.id=gce
Upholds=coreos-metadata-sshkeys@core.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
[Install]
WantedBy=multi-user.target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service coreos-metadata-sshkeys@core.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf" | |
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - the Upholds
units are executed unconditionally:
$ systemctl status sshkeys@core.service
○ sshkeys@core.service
Loaded: loaded (/usr/lib/systemd/system/sshkeys@.service; static)
Active: inactive (dead)
Condition: start condition failed at Fri 2023-09-08 12:14:19 UTC; 17min ago
Sep 08 12:14:19 localhost systemd[1]: sshkeys@core.service was skipped because no trigger condition checks were met.
$ systemctl status coreos-metadata-sshkeys@core.service
● coreos-metadata-sshkeys@core.service - Flatcar Metadata Agent (SSH Keys)
Loaded: loaded (/usr/lib/systemd/system/coreos-metadata-sshkeys@.service; disabled; preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2023-09-08 12:34:50 UTC; 7s ago
Process: 1799 ExecStart=/usr/bin/coreos-metadata ${COREOS_METADATA_OPT_PROVIDER} --ssh-keys=core (code=exited, status=1/FAILURE)
Main PID: 1799 (code=exited, status=1/FAILURE)
CPU: 11ms
-> whole qemu test suite is failing. I guess we can go back to ExecStart=systemctl start coreos-metadata-sshkeys@core.service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the contents of /usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf
? Is this really the most recent state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, understood, so it seems the condition is only for the [Service] section? And Upholds= still gets used all the time. Then yes, ExecStart=systemctl start coreos-metadata-sshkeys@core.service
instead of ExecStart=/bin/true
sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the contents of /usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf? Is this really the most recent state?
It's for qemu
so the sysext image is not even present.
so it seems the condition is only for the [Service] section?
That's my conclusion too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so more follow-up for the init PR… Sorry for the misleading suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries - I fixed this right after: flatcar/init#105 now the CI is 🟢
9016250
to
8841684
Compare
ExecStartPre=/usr/bin/ln --symbolic /usr/share/amazon/ssm/amazon-ssm-agent.json.template /etc/amazon/ssm/amazon-ssm-agent.json | ||
ExecStartPre=/usr/bin/ln --symbolic /usr/share/amazon/ssm/seelog.xml.template /etc/amazon/ssm/seelog.xml | ||
ExecStart=/usr/bin/ln --symbolic /usr/share/amazon/eks/bootstrap.sh /etc/eks/bootstrap.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the link already exists this will fail, do you want it to be skipped then? This would be possible with ExecStartPre=-
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should talk about this: how we manage the update of /etc
. While redoing this section, I was thinking about using cp --backup
to a) update the /etc/
files and b) keep any previous configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these files are likely touched by the user I would exclude them from the migration step. This means they will be there as regular files and not updated unless we could identify them as untouched with a checksum and create the symlink. For new instances the symlink is the default and we don't need to do a update logic and this is covered by the sysext content. The user could still overwrite the symlink or replace it with a file if we use ExecStartPre=-
.
8841684
to
957cd43
Compare
The oem release ID would only be used for the update payload name and the migration file. No translation to |
For this one yes, no translation but we still need one for the kernel command line parameter (see: 9df7e19). Otherwise |
- AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`. The OEM software is still not updated but this will be added soon. | ||
- The AWS OEM ID kernel command line parameter changed to `ami` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`. The OEM software is still not updated but this will be added soon. | |
- The AWS OEM ID kernel command line parameter changed to `ami` | |
- AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr` |
@@ -0,0 +1,2 @@ | |||
[Unit] | |||
Upholds=amazon-ssm-agent.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upholds=amazon-ssm-agent.service | |
Upholds=amazon-ssm-agent.service setup-oem.service |
|
||
src_install() { | ||
systemd_dounit "${FILESDIR}/setup-oem.service" | ||
systemd_install_serviced "${FILESDIR}/10-oem-ami.conf" multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line correct? I don't see the service running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even after starting manually I think it has some problems:
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 INFO [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 ERROR [ssm-agent-worker] Agent failed to assume any identity
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 ERROR [ssm-agent-worker] failed to find identity, retrying: failed to find agent identity
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type OnPrem can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type EC2 can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the termination also happens on the latest Alpha it can be ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line correct? I don't see the service running
The file is actually installed in /etc/systemd/system/multi-user.target.d/
so it's not packaged.
ec04512
to
57718ba
Compare
57718ba
to
d703e00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks.
...tainer/src/third_party/coreos-overlay/coreos-base/common-oem-files/common-oem-files-0.ebuild
Outdated
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/oem-ami/oem-ami-3.2.985.0-r1.ebuild
Outdated
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/misc-files/files/oems/ami
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remembered about revision bumping in overlay.
...tainer/src/third_party/coreos-overlay/coreos-base/common-oem-files/common-oem-files-0.ebuild
Outdated
Show resolved
Hide resolved
...container/src/third_party/coreos-overlay/coreos-base/flatcar-eks/flatcar-eks-0.0.1-r1.ebuild
Outdated
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/misc-files/files/oems/ami
Show resolved
Hide resolved
...ner/src/third_party/coreos-overlay/coreos-base/oem-ec2-compat/oem-ec2-compat-0.1.2-r3.ebuild
Outdated
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/oem-ami/oem-ami-3.2.985.0-r1.ebuild
Outdated
Show resolved
Hide resolved
- drop the OEM mention - install things under /usr/share/amazon/ssm - add systemd unit from the upstream Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
while this ebuild will be dropped in the near future, we still need to maintain openstack ebuild. `flatcar-eks` was a runtime dependency of openstack/brightbox too. I think it was a mistake ? Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
found by booting stable on AWS: `find /usr/share/oem` + checking the content of files created by base Ignition. Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
d703e00
to
678c8fc
Compare
...ner/src/third_party/coreos-overlay/coreos-base/common-oem-files/common-oem-files-0-r1.ebuild
Outdated
Show resolved
Hide resolved
For this vendor, the OEM ID from the oem-release file is different from the oem.id kernel commandline parameter. Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
678c8fc
to
bfaea38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks good from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
In this PR, we port the current AWS OEM to a systemd system extension (
sysext
) image. It allows us to not rely on thebase-ec2.ign
configuration file and to remove specific OEM bits from the two related ebuilds:flatcar-eks
andamazon-ssm-agent
.Testing done
changelog/
directory (user-facing change, bug fix, security fix, update)/boot
and/usr
size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.related to: flatcar/Flatcar#1145