This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
baremetal: integrate automated (re-)provisioning logic
The bare-metal platform currently lacks any understanding of whether an instance is actually running the configuration that lokoctl put into matchbox, because, when the configuration was updated, there is no notification to the user that PXE booting has to be done again for this instance. Also, it is not clear to the user when to boot from PXE because the PXE boot must happen after lokoctl populated matchbox with the (new) configuration but before any other steps time out. The goal of this patch is to bring the baremetal platform to an actually usable level and support automated provisioning and reprovisioning, in a configurable way regardless if IPMI is used or VMs are created. In addition, we don't want to require a complicated PXE boot for each configuration update because it is slow, fragile and needs a special DHCP infrastructure which may be lacking in a production environment. Add user-defined commands in the "pxe_commands" variable to perform automated PXE provisioning at the right time, i.e. initally at the first run or when recreating a node. To address the problem that PXE booting is a long and maybe even manual process, or even impossible at production side, we can rely on Ignition to simulate reprovisioning by creating the "first_boot" flag file via SSH and issuing a reboot, which makes Ignition fetch the configuration from matchbox, and if we make sure to clean the root filesystem by formatting it, the result is the same as if reprovisioned was done with a PXE boot. The logic is achieved by a "null resource" in Terraform that executes a helper script which either does a PXE boot or uses SSH to trigger a reprovisioning with Ignition. It also handles the case of ignoring userdata changes for controller nodes to prevent losing etcd state. Since there is no notion of a baremetal node on the Terraform level (reminder: all this exercise here is done because we don't have a Terraform provider doing this for us) a local flag file is created under the asset directory on the machine which runs lokoctl. If it exists, the node was provisioned with PXE and SSH will be used for reprovisioning, if it does not exist, it will be provisioned with PXE during inital setup and for the next reprovisioning because the user forced recreating the node by deleting the flag file. Another flag file on the node is used to check whether a node was successfully reprovisioned. When SSH is used to reprovision, the kernel parameters for GRUB are updated directly because they are not part of the Ignition configuration. The "copy-controller-secrets" step is run after recreating a controller node, again since there is no notion of a node object this is solved by depending on the variables itself which define the node state. Also add user-defined commands in the "install_pre_reboot_cmds" variable to run after the PXE OS installation and before booting into the final OS, needed to set up persistent booting from disk after the PXE booting was configured in "pxe_commands". The whole patch is used by Racker (https://github.com/kinvolk/racker), and can be tested either with the "bootstrap/prepare.sh" script to create VMs with lokoctl or by running Racker in the QEMU IPMI simulator environment through the "racker-sim/ipmi-env.sh" script and a Racker Docker image built with "installer/conf.yaml" pointing to this Lokomotive branch.
- Loading branch information
Showing
17 changed files
with
264 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
87 changes: 87 additions & 0 deletions
87
assets/terraform-modules/matchbox-flatcar/pxe-helper.sh.tmpl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# (executed in-line, #!/... would be ignored) | ||
# Terraform template variable substitution: | ||
name=${name} | ||
domain=${domain} | ||
mac=${mac} | ||
asset_dir=${asset_dir} | ||
ignore_changes=${ignore_changes} | ||
kernel_args="${kernel_args}" | ||
kernel_console="${kernel_console}" | ||
ignition_endpoint="${ignition_endpoint}" | ||
# From now on use $var for dynamic shell substitution | ||
|
||
if test -f "$asset_dir/$mac" && [ "$(cat "$asset_dir/$mac")" = "$domain" ]; then | ||
echo "found $asset_dir/$mac containing $domain, skipping PXE install" | ||
node_exists=yes | ||
else | ||
echo "$asset_dir/$mac does not contain $domain, forcing PXE install" | ||
node_exists=no | ||
fi | ||
|
||
if [ $node_exists = yes ]; then | ||
if $ignore_changes ; then | ||
echo "Keeping old config because 'ignore_changes' is set." | ||
exit 0 | ||
else | ||
# run single commands that can be retried without a side effect in case the connection got disrupted | ||
count=30 | ||
while [ $count -gt 0 ] && ! ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NumberOfPasswordPrompts=0 core@$domain sudo touch /boot/flatcar/first_boot; do | ||
sleep 1 | ||
count=$((count - 1)) | ||
done | ||
if [ $count -eq 0 ]; then | ||
echo "error reaching $domain via SSH, please remove the $asset_dir/$mac file to force a PXE install" | ||
exit 1 | ||
fi | ||
echo "created the first_boot flag file to reprovision $domain" | ||
count=5 | ||
while [ $count -gt 0 ] && ! ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NumberOfPasswordPrompts=0 core@$domain "printf 'set linux_append=\"$kernel_args ignition.config.url=$ignition_endpoint?mac=$mac&os=installed\"\\nset linux_console=\"$kernel_console\"\\n' | sudo tee /usr/share/oem/grub.cfg"; do | ||
sleep 1 | ||
count=$((count - 1)) | ||
done | ||
if [ $count -eq 0 ]; then | ||
echo "error reaching $domain via SSH, please retry" | ||
exit 1 | ||
fi | ||
count=5 | ||
while [ $count -gt 0 ] && ! ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NumberOfPasswordPrompts=0 core@$domain sudo systemctl reboot; do | ||
sleep 1 | ||
count=$((count - 1)) | ||
done | ||
if [ $count -eq 0 ]; then | ||
echo "error reaching $domain via SSH, please reboot manually" | ||
exit 1 | ||
fi | ||
echo "rebooted the $domain" | ||
fi | ||
else | ||
# the user may provide ipmitool commands or any other logic for forcing a PXE boot | ||
${pxe_commands} | ||
fi | ||
|
||
echo "checking that $domain comes up" | ||
count=600 | ||
# check that we can reach the node and that it has the flag file which we remove here, indicating a reboot happened which prevents a race when issuing the reboot takes longer (both the systemctl reboot and PXE case) | ||
# Just in case the connection breaks and SSH may report an error code but still execute successfully, we will first check file existence and then delete with "rm -f" to be able to rerun both commands. | ||
# This sequence gives us the same error reporting as just running "rm" once. | ||
while [ $count -gt 0 ] && ! ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NumberOfPasswordPrompts=0 core@$domain test -f /ignition_ran; do | ||
sleep 1 | ||
count=$((count - 1)) | ||
done | ||
if [ $count -eq 0 ]; then | ||
echo "error: failed verifying with SSH if $domain came up by checking the /ignition_ran flag file" | ||
exit 1 | ||
fi | ||
count=5 | ||
while [ $count -gt 0 ] && ! ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NumberOfPasswordPrompts=0 core@$domain sudo rm -f /ignition_ran; do | ||
sleep 1 | ||
count=$((count - 1)) | ||
done | ||
if [ $count -eq 0 ]; then | ||
echo "error: failed to remove the /ignition_ran flag file on $domain" | ||
exit 1 | ||
else | ||
echo "$domain came up again" | ||
fi | ||
# only write the state file once the system is up, this allows to rerun lokoctl if the first PXE boot did not work and it will try again | ||
echo $domain > "$asset_dir/$mac" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
resource "null_resource" "reprovision-node-when-ignition-changes" { | ||
# Triggered when the Ignition Config changes | ||
triggers = { | ||
ignition_config = matchbox_profile.node.raw_ignition | ||
kernel_args = join(" ", var.kernel_args) | ||
kernel_console = join(" ", var.kernel_console) | ||
} | ||
# Wait for the new Ignition config object to be ready before rebooting | ||
depends_on = [matchbox_group.node] | ||
# Trigger running Ignition on the next reboot (first_boot flag file) and reboot the instance, or, if the instance needs to be (re)provisioned, run external commands for PXE booting (also runs on the first provisioning) | ||
provisioner "local-exec" { | ||
command = templatefile("${path.module}/pxe-helper.sh.tmpl", { domain = var.node_domain, name = var.node_name, mac = var.node_mac, pxe_commands = var.pxe_commands, asset_dir = var.asset_dir, kernel_args = join(" ", var.kernel_args), kernel_console = join(" ", var.kernel_console), ignition_endpoint = format("%s/ignition", var.http_endpoint), ignore_changes = var.ignore_changes }) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.