默认情况下,如果容器出于任何原因崩溃,Kubernetes将重新它。它使用Liveness和Readiness探针,可以将其配置识别运行状况良好的容器以向其发送流量并在需要时重启它们,保证应用程序运行健壮。
- Kubernetes中使用Liveness探针来了解Pod是alive or dead。Pod可能由于各种原因而处于dead状态。当Liveness探针未通过时,Kubernetes将杀死并重新创建Pod。
- Kubernetes中使用了Readiness探针,以了解Pod何时准备接收流量。仅当Readiness探针通过时,Pod才会从服务中接收流量。如果就绪探针失败,则不会将流量发送到Pod。
本节目的
- 我们将了解如何定义Liveness和Readiness探针,并针对Pod的不同状态对其进行测试
10.1 配置 Liveness Probe
- Liveness Probe determines how kubelet should check the container in order to consider whether it is healthy or not.
- The kubelet uses the periodSeconds field to do frequent check on the Container.
- The initialDelaySeconds field is used to tell kubelet that it should wait for several seconds before doing the first probe.
- To perform a probe, in this case, kubelet sends a HTTP GET request to the server hosting this pod and if the handler for the servers /health returns a success code, then the container is considered healthy. If the handler returns a failure code, the kubelet kills the container and restarts it.
# 部署 Liveness Probe
mkdir -p resource/healthchecks
cat <<EoF > healthchecks/liveness-app.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-app
spec:
containers:
- name: liveness
image: brentley/ecsdemo-nodejs
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
EoF
kubectl apply -f healthchecks/liveness-app.yaml
# Check status
kubectl get pod liveness-app
NAME READY STATUS RESTARTS AGE
liveness-app 1/1 Running 0 6s
kubectl describe pod liveness-app
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 41s default-scheduler Successfully assigned default/liveness-app to ip-192-168-14-19.cn-northwest-1.compute.internal
Normal Pulling 40s kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Pulling image "brentley/ecsdemo-nodejs"
Normal Pulled 37s kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Successfully pulled image "brentley/ecsdemo-nodejs"
Normal Created 37s kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Created container liveness
Normal Started 36s kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Started container liveness
# Simulate failure
kubectl exec -it liveness-app -- /bin/kill -s SIGUSR1 1
# check how liveness probe work
kubectl describe pod liveness-app
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m6s default-scheduler Successfully assigned default/liveness-app to ip-192-168-14-19.cn-northwest-1.compute.internal
Warning Unhealthy 57s (x3 over 67s) kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Liveness probe failed: Get http://192.168.29.229:3000/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Normal Killing 57s kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Container liveness failed liveness probe, will be restarted
Normal Pulling 27s (x2 over 6m5s) kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Pulling image "brentley/ecsdemo-nodejs"
Normal Pulled 24s (x2 over 6m2s) kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Successfully pulled image "brentley/ecsdemo-nodejs"
Normal Created 24s (x2 over 6m2s) kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Created container liveness
Normal Started 24s (x2 over 6m1s) kubelet, ip-192-168-14-19.cn-northwest-1.compute.internal Started container liveness
kubectl get pod liveness-app
NAME READY STATUS RESTARTS AGE
liveness-app 1/1 Running 1 6m26s
# check logs
kubectl logs liveness-app
kubectl logs liveness-app --previous
10.2 配置 Readiness Probe
# 部署 Readiness Probe
cat <<EoF > healthchecks/readiness-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: readiness-deployment
spec:
replicas: 3
selector:
matchLabels:
app: readiness-deployment
template:
metadata:
labels:
app: readiness-deployment
spec:
containers:
- name: readiness-deployment
image: alpine
command: ["sh", "-c", "touch /tmp/healthy && sleep 86400"]
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 3
EoF
kubectl apply -f healthchecks/readiness-deployment.yaml
# check status of pod and replicas
kubectl get pods -l app=readiness-deployment
NAME READY STATUS RESTARTS AGE
readiness-deployment-7d8df88986-sgtvz 1/1 Running 0 5m31s
readiness-deployment-7d8df88986-xxvt4 1/1 Running 0 5m31s
readiness-deployment-7d8df88986-zqczd 1/1 Running 0 5m31s
kubectl describe deployment readiness-deployment | grep Replicas:
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
kubectl exec -it <YOUR-FIRST-READINESS-POD-NAME> -- rm /tmp/healthy
kubectl get pods -l app=readiness-deployment
NAME READY STATUS RESTARTS AGE
readiness-deployment-7d8df88986-sgtvz 0/1 Running 0 6m37s
readiness-deployment-7d8df88986-xxvt4 1/1 Running 0 6m37s
readiness-deployment-7d8df88986-zqczd 1/1 Running 0 6m37s
# Traffic will not be routed to the first pod in the above deployment. The ready column confirms that the readiness probe for this pod did not pass and hence was marked as not ready.
# check for the replicas that are available to serve traffic when a service is pointed to this deployment.
kubectl describe deployment readiness-deployment | grep Replicas:
Replicas: 3 desired | 3 updated | 3 total | 2 available | 1 unavailable
# Make pod recreate the /tmp/healthy file. Then the pod can pass the probe, it getting marked as ready and will begin to receive traffic again.
kubectl exec -it <YOUR-FIRST-READINESS-POD-NAME> -- touch /tmp/healthy
kubectl get pods -l app=readiness-deployment
NAME READY STATUS RESTARTS AGE
readiness-deployment-7d8df88986-sgtvz 1/1 Running 0 7m30s
readiness-deployment-7d8df88986-xxvt4 1/1 Running 0 7m30s
readiness-deployment-7d8df88986-zqczd 1/1 Running 0 7m30s
kubectl describe deployment readiness-deployment | grep Replicas:
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
kubectl delete -f healthchecks/liveness-app.yaml
kubectl delete -f healthchecks/readiness-deployment.yaml