步骤10 可用性-健康检查

默认情况下，如果容器出于任何原因崩溃，Kubernetes将重新它。它使用Liveness和Readiness探针，可以将其配置识别运行状况良好的容器以向其发送流量并在需要时重启它们，保证应用程序运行健壮。

Kubernetes中使用Liveness探针来了解Pod是alive or dead。Pod可能由于各种原因而处于dead状态。当Liveness探针未通过时，Kubernetes将杀死并重新创建Pod。
Kubernetes中使用了Readiness探针，以了解Pod何时准备接收流量。仅当Readiness探针通过时，Pod才会从服务中接收流量。如果就绪探针失败，则不会将流量发送到Pod。

本节目的

我们将了解如何定义Liveness和Readiness探针，并针对Pod的不同状态对其进行测试

10.1 配置 Liveness Probe

Liveness Probe determines how kubelet should check the container in order to consider whether it is healthy or not.
The kubelet uses the periodSeconds field to do frequent check on the Container.
The initialDelaySeconds field is used to tell kubelet that it should wait for several seconds before doing the first probe.
To perform a probe, in this case, kubelet sends a HTTP GET request to the server hosting this pod and if the handler for the servers /health returns a success code, then the container is considered healthy. If the handler returns a failure code, the kubelet kills the container and restarts it.

# 部署 Liveness Probe
mkdir -p resource/healthchecks
cat <<EoF > healthchecks/liveness-app.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-app
spec:
  containers:
  - name: liveness
    image: brentley/ecsdemo-nodejs
    livenessProbe:
      httpGet:
        path: /health
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5
EoF

kubectl apply -f healthchecks/liveness-app.yaml

# Check status
kubectl get pod liveness-app
NAME           READY   STATUS    RESTARTS   AGE
liveness-app   1/1     Running   0          6s

kubectl describe pod liveness-app
Events:
  Type    Reason     Age   From                                                       Message
  ----    ------     ----  ----                                                       -------
  Normal  Scheduled  41s   default-scheduler                                          Successfully assigned default/liveness-app to ip-192-168-14-19.cn-northwest-1.compute.internal
  Normal  Pulling    40s   kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Pulling image "brentley/ecsdemo-nodejs"
  Normal  Pulled     37s   kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Successfully pulled image "brentley/ecsdemo-nodejs"
  Normal  Created    37s   kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Created container liveness
  Normal  Started    36s   kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Started container liveness

模拟 Liveness 失败

# Simulate failure
kubectl exec -it liveness-app -- /bin/kill -s SIGUSR1 1

# check how liveness probe work
kubectl describe pod liveness-app
 Type     Reason     Age                 From                                                       Message
  ----     ------     ----                ----                                                       -------
  Normal   Scheduled  6m6s                default-scheduler                                          Successfully assigned default/liveness-app to ip-192-168-14-19.cn-northwest-1.compute.internal
  Warning  Unhealthy  57s (x3 over 67s)   kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Liveness probe failed: Get http://192.168.29.229:3000/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    57s                 kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Container liveness failed liveness probe, will be restarted
  Normal   Pulling    27s (x2 over 6m5s)  kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Pulling image "brentley/ecsdemo-nodejs"
  Normal   Pulled     24s (x2 over 6m2s)  kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Successfully pulled image "brentley/ecsdemo-nodejs"
  Normal   Created    24s (x2 over 6m2s)  kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Created container liveness
  Normal   Started    24s (x2 over 6m1s)  kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal  Started container liveness

kubectl get pod liveness-app
NAME           READY   STATUS    RESTARTS   AGE
liveness-app   1/1     Running   1          6m26s

# check logs
kubectl logs liveness-app
kubectl logs liveness-app --previous

10.2 配置 Readiness Probe

# 部署 Readiness Probe
cat <<EoF > healthchecks/readiness-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: readiness-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: readiness-deployment
  template:
    metadata:
      labels:
        app: readiness-deployment
    spec:
      containers:
      - name: readiness-deployment
        image: alpine
        command: ["sh", "-c", "touch /tmp/healthy && sleep 86400"]
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 5
          periodSeconds: 3
EoF

kubectl apply -f healthchecks/readiness-deployment.yaml

# check status of pod and replicas
kubectl get pods -l app=readiness-deployment
NAME                                    READY   STATUS    RESTARTS   AGE
readiness-deployment-7d8df88986-sgtvz   1/1     Running   0          5m31s
readiness-deployment-7d8df88986-xxvt4   1/1     Running   0          5m31s
readiness-deployment-7d8df88986-zqczd   1/1     Running   0          5m31s

kubectl describe deployment readiness-deployment | grep Replicas:
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable

模拟 Readiness 失败

kubectl exec -it <YOUR-FIRST-READINESS-POD-NAME> -- rm /tmp/healthy

kubectl get pods -l app=readiness-deployment
NAME                                    READY   STATUS    RESTARTS   AGE
readiness-deployment-7d8df88986-sgtvz   0/1     Running   0          6m37s
readiness-deployment-7d8df88986-xxvt4   1/1     Running   0          6m37s
readiness-deployment-7d8df88986-zqczd   1/1     Running   0          6m37s

# Traffic will not be routed to the first pod in the above deployment. The ready column confirms that the readiness probe for this pod did not pass and hence was marked as not ready.

# check for the replicas that are available to serve traffic when a service is pointed to this deployment.
kubectl describe deployment readiness-deployment | grep Replicas:
Replicas:               3 desired | 3 updated | 3 total | 2 available | 1 unavailable

# Make pod recreate the /tmp/healthy file. Then the pod can pass the probe, it getting marked as ready and will begin to receive traffic again.
kubectl exec -it <YOUR-FIRST-READINESS-POD-NAME> -- touch /tmp/healthy
kubectl get pods -l app=readiness-deployment
NAME                                    READY   STATUS    RESTARTS   AGE
readiness-deployment-7d8df88986-sgtvz   1/1     Running   0          7m30s
readiness-deployment-7d8df88986-xxvt4   1/1     Running   0          7m30s
readiness-deployment-7d8df88986-zqczd   1/1     Running   0          7m30s

kubectl describe deployment readiness-deployment | grep Replicas:
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable

Cleanup

kubectl delete -f healthchecks/liveness-app.yaml
kubectl delete -f healthchecks/readiness-deployment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

步骤10-可用性-健康检查.md

步骤10-可用性-健康检查.md

步骤10 可用性-健康检查

模拟 Liveness 失败

模拟 Readiness 失败

Cleanup

Files

步骤10-可用性-健康检查.md

Latest commit

History

步骤10-可用性-健康检查.md

File metadata and controls

步骤10 可用性-健康检查

模拟 Liveness 失败

模拟 Readiness 失败

Cleanup