Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

talosctl health extra flags #7967

Closed
mircea-pavel-anton opened this issue Nov 20, 2023 · 4 comments
Closed

talosctl health extra flags #7967

mircea-pavel-anton opened this issue Nov 20, 2023 · 4 comments
Labels

Comments

@mircea-pavel-anton
Copy link

Feature Request

Ability to specify whether or not to wait for nodes to be ready.

Description

When deloying Talos, I saw that a lot of people are disabling the CNI and opting to manually install one later on, mainly when doing gitops.

Currently, talosctl health checks on the health of the cluster end-to-end, i.e. both Talos and Kubernetes. I think there should be a flag, something like talosctl health --kubernetes=false which would validate the health up to and including the kubelet, so without checking if the nodes are in a Ready state, since without a CNI they will never reach that state.

This makes it a bit harder to automate installs like bootstrap -> wait -> apply CNI for example

@mircea-pavel-anton
Copy link
Author

For some context, I am currently using a bash script to wait until the kubelet becomes healthy on my nodes:

while true; do
    output=$(talosctl dmesg -n $NODE_IP 2>&1)

    if echo "$output" | grep -Fq "service[kubelet](Running): Health check successful"; then
        echo ""
        echo "Kubelet is Healthy on node $NODE_IP!"
        break
    else
        printf "."
        sleep 1
    fi
done

But I feel like there should be a more elegant way to handle this, since it's not an uncommon scenario to disable the CNI

@mrclrchtr
Copy link
Contributor

That would be amazing. I have exactly the same problem with CNI. Especially in terraform, talos_cluster_health runs infinitely when no CNI is installed. This also makes a reapply on abort impossible because it wants to read first before you can apply.

mrclrchtr added a commit to hcloud-talos/terraform-hcloud-talos that referenced this issue Mar 18, 2024
mrclrchtr added a commit to hcloud-talos/terraform-hcloud-talos that referenced this issue Mar 18, 2024
Copy link

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 15, 2024
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants