Starting to Learn Kubernetes a Step Behind - 13. Health Checks and Container Lifecycle -
Story
- Starting to Learn Kubernetes a Step Behind - 01. Environment Selection -
- Starting to Learn Kubernetes a Step Behind - 02. Docker For Mac -
- Starting to Learn Kubernetes a Step Behind - 03. Raspberry Pi -
- Starting to Learn Kubernetes a Step Behind - 04. kubectl -
- Starting to Learn Kubernetes a Step Behind - 05. workloads Part 1 -
- Starting to Learn Kubernetes a Step Behind - 06. workloads Part 2 -
- Starting to Learn Kubernetes a Step Behind - 07. workloads Part 3 -
- Starting to Learn Kubernetes a Step Behind - 08. discovery&LB Part 1 -
- Starting to Learn Kubernetes a Step Behind - 09. discovery&LB Part 2 -
- Starting to Learn Kubernetes a Step Behind - 10. config&storage Part 1 -
- Starting to Learn Kubernetes a Step Behind - 11. config&storage Part 2 -
- Starting to Learn Kubernetes a Step Behind - 12. Resource Limits -
- Starting to Learn Kubernetes a Step Behind - 13. Health Checks and Container Lifecycle -
- Starting to Learn Kubernetes a Step Behind - 14. Scheduling -
- Starting to Learn Kubernetes a Step Behind - 15. Security -
- Starting to Learn Kubernetes a Step Behind - 16. Components -
Last time
In Starting to learn Kubernetes a step behind - 12. Resource Limits -, we learned about resource limits such as requests and limits. This time, we will learn about health checks and container lifecycle.
Health Check
In Kubernetes, there are two types of health checks prepared for determining the health of a Pod.
- Liveness Probe
- Determines if the Pod is healthy. If it is not, the Pod is restarted.
- Readiness Probe
- Determines if the Pod is ready to serve. If it is not ready, it does not allow traffic.
For example, it is effective when a memory leak occurs in the Pod and it becomes unresponsive. The LoadBalancer service has a simple health check by ICMP prepared by default.
Also, there are three methods for both Liveness and Readiness.
- exec
- Executes a command, and if the exit code is not 0, it fails.
- httpGet
- Executes an HTTP GET request, and if the statusCode is not between 200~399, it fails.
- tcpSocket
- Fails if a TCP session cannot be established.
Let's try it out.
# sample-healthcheck.yaml
apiVersion: v1
kind: Pod
metadata:
name: sample-healthcheck
labels:
app: sample-app
spec:
containers:
- name: nginx-container
image: nginx:1.12
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /index.html
port: 80
scheme: HTTP
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 2
initialDelaySeconds: 5
periodSeconds: 3
readinessProbe:
exec:
command: ["ls", "/usr/share/nginx/html/50x.html"]
timeoutSeconds: 1
successThreshold: 2
failureThreshold: 1
initialDelaySeconds: 5
periodSeconds: 3
- timeoutSeconds
- The number of seconds until timeout.
- successThreshold
- The number of checks until it is determined to be successful.
- failureThreshold
- The number of checks until it is determined to be a failure.
- initialDelaySeconds
- The delay until the first health check starts.
- periodSeconds
- The interval of health checks.
pi@raspi001:~/tmp $ k apply -f sample-healthcheck.yaml
pi@raspi001:~/tmp $ k describe pod sample-healthcheck | egrep "Liveness|Readiness"
Liveness: http-get http://:80/index.html delay=5s timeout=1s period=3s #success=1 #failure=2
Readiness: exec [ls /usr/share/nginx/html/50x.html] delay=5s timeout=1s period=3s #success=2 #failure=1
It's working as expected. Now, let's make it fail.
To fail the liveness, you just need to delete the index.html.
pi@raspi001:~/tmp $ k exec -it sample-healthcheck rm /usr/share/nginx/html/index.html
pi@raspi001:~/tmp $ k get pods --watch
NAME READY STATUS RESTARTS AGE
sample-healthcheck 1/1 Running 1 9m54s
sample-healthcheck 0/1 Running 2 10m
sample-healthcheck 1/1 Running 2 10m
It was deleted once and then restarted. Next, let's fail the readiness. Here, you just need to delete the 50x.html.
pi@raspi001:~/tmp $ k exec -it sample-healthcheck rm /usr/share/nginx/html/50x.html
pi@raspi001:~/tmp $ k get pods --watch
NAME READY STATUS RESTARTS AGE
sample-healthcheck 1/1 Running 2 16m
sample-healthcheck 0/1 Running 2 16m
pi@raspi001:~/tmp $ k exec -it sample-healthcheck touch /usr/share/nginx/html/50x.html
pi@raspi001:~/tmp $ k get pods --watch
NAME READY STATUS RESTARTS AGE
sample-healthcheck 0/1 Running 2 17m
sample-healthcheck 1/1 Running 2 17m
As expected, when you delete the 50x.html, it is removed from READY, and when you add it, it returns to READY.
Restarting Containers
Whether to restart a container when the container process stops or fails a health check is determined by spec.restartPolicy. There are three types.
- Always
- Always restart the Pod.
- OnFailure
- Restart the Pod in case of unexpected stop with an exit code other than 0.
- Never
- Do not restart.
Let's try it out.
# sample-restart-always.yaml
apiVersion: v1
kind: Pod
metadata:
name: sample-restart-always
spec:
restartPolicy: Always
containers:
- name: nginx-container
image: nginx:1.12
command: ["sh", "-c", "exit 0"] # 成功の場合
# command: ["sh", "-c", "exit 1"] # 失敗の場合
pi@raspi001:~/tmp $ k apply -f sample-restart-always.yaml
# 成功の場合
pi@raspi001:~/tmp $ k get pods sample-restart-always --watch
NAME READY STATUS RESTARTS AGE
sample-restart-always 0/1 ContainerCreating 0 13s
sample-restart-always 0/1 Completed 0 19s
sample-restart-always 0/1 Completed 1 27s
sample-restart-always 0/1 CrashLoopBackOff 1 28s
sample-restart-always 0/1 Completed 2 37s
# 失敗の場合
pi@raspi001:~/tmp $ k get pods sample-restart-always --watch
NAME READY STATUS RESTARTS AGE
sample-restart-always 0/1 ContainerCreating 0 7s
sample-restart-always 0/1 Error 0 12s
sample-restart-always 0/1 Error 1 17s
sample-restart-always 0/1 CrashLoopBackOff 1 18s
sample-restart-always 0/1 Error 2 37s
You can see that both success and failure are restarting.
# sample-restart-onfailure.yaml
apiVersion: v1
kind: Pod
metadata:
name: sample-restart-onfailure
spec:
restartPolicy: OnFailure
containers:
- name: nginx-container
image: nginx:1.12
command: ["sh", "-c", "exit 0"] # 成功の場合
# command: ["sh", "-c", "exit 1"] # 失敗の場合
pi@raspi001:~/tmp $ k apply -f sample-restart-onfailure.yaml
# 成功の場合
pi@raspi001:~/tmp $ k get pods sample-restart-onfailure --watch
NAME READY STATUS RESTARTS AGE
sample-restart-onfailure 0/1 ContainerCreating 0 3s
sample-restart-onfailure 0/1 Completed 0 15s
# 失敗の場合
pi@raspi001:~/tmp $ k get pods sample-restart-onfailure --watch
NAME READY STATUS RESTARTS AGE
sample-restart-onfailure 0/1 ContainerCreating 0 4s
sample-restart-onfailure 0/1 Error 0 22s
sample-restart-onfailure 0/1 Error 1 28s
sample-restart-onfailure 0/1 CrashLoopBackOff 1 29s
sample-restart-onfailure 0/1 Error 2 50s
When successful, it ends with Completed. It's not CrashLoopBackOff. When it fails, it becomes Error and is CrashLoopBackOff. It's as expected.
initContainers
This is a feature to start another container before starting the main container of a Pod. There is originally spec.containers, but this starts in parallel, so it is not suitable when order is required. initContainers can be set with spec.initContainers, and multiple can be specified. If there are multiple, they start in order from the top.
Let's try it out.
# sample-initcontainer.yaml
apiVersion: v1
kind: Pod
metadata:
name: sample-initcontainer
spec:
initContainers:
- name: output-1
image: nginx:1.12
command:
["sh", "-c", "sleep 20; echo 1st > /usr/share/nginx/html/index.html"]
volumeMounts:
- name: html-volume
mountPath: /usr/share/nginx/html/
- name: output-2
image: nginx:1.12
command:
["sh", "-c", "sleep 10; echo 2nd >> /usr/share/nginx/html/index.html"]
volumeMounts:
- name: html-volume
mountPath: /usr/share/nginx/html/
containers:
- name: nginx-container
image: nginx:1.12
volumeMounts:
- name: html-volume
mountPath: /usr/share/nginx/html/
volumes:
- name: html-volume
emptyDir: {}
pi@raspi001:~/tmp $ k get pod sample-initcontainer --watch
NAME READY STATUS RESTARTS AGE
sample-initcontainer 0/1 Init:0/2 0 3s
sample-initcontainer 0/1 Init:0/2 0 9s
sample-initcontainer 0/1 Init:1/2 0 30s
sample-initcontainer 0/1 Init:1/2 0 38s
sample-initcontainer 0/1 PodInitializing 0 51s
sample-initcontainer 1/1 Running 0 59s
pi@raspi001:~/tmp $ k exec -it sample-initcontainer cat /usr/share/nginx/html/index.html
1st
2nd
Indeed, the initContainers are starting in order. Hmm, hmm.
Command execution at startup and termination (postStart, preStop)
You can execute a command after container startup with postStart, and a command before container termination with preStop.
# sample-lifecycle.yaml
apiVersion: v1
kind: Pod
metadata:
name: sample-lifecycle
spec:
containers:
- name: nginx-container
image: nginx:1.12
command: ["/bin/sh", "-c", "touch /tmp/started; sleep 3600"]
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "sleep 20; touch /tmp/poststart"]
preStop:
exec:
command: ["/bin/sh", "-c", "touch /tmp/prestop; sleep 20"]
pi@raspi001:~/tmp $ k apply -f sample-lifecycle.yaml
pi@raspi001:~/tmp $ k exec -it sample-lifecycle ls /tmp
started
# 数秒後
pi@raspi001:~/tmp $ $ k exec -it sample-lifecycle ls /tmp
poststart started
pi@raspi001:~/tmp $ k delete -f sample-lifecycle.yaml
# すぐ!
pi@raspi001:~/tmp $ k exec -it sample-lifecycle ls /tmp
poststart prestop started
Indeed, postStart and preStop are working. What we need to be careful about is that postStart seems to be almost the same as the execution of spec.containers[].command. (asynchronous)
Safe termination of Pods and timing
The number of seconds specified in terminationGracePeriodSeconds is the grace period from the start of pod deletion. By default, it is set to 30 seconds. If the preStop+SIGTERM process does not finish within 30 seconds, it will be forcibly stopped with SIGKILL. However, if preStop has not finished and 30 seconds have passed, you can perform the SIGTERM process for only 2 seconds. The value of terminationGracePeriodSeconds should be set to the number of seconds to always finish prePost.
Excluding Nodes from scheduling
There is a command called cordon to exclude Nodes from kubernetes scheduling. There are two states of Node, SchedulingEnabled and SchedulingDisabled, and when it is in the latter state, it is excluded from scheduling by kubernetes, and functions such as updating ReplicaSet will not work.
When you use the cordon command, the specified Node becomes SchedulingDisabled. (uncordon is the opposite) However, the currently running Pods remain scheduled, and the newly added ones will be excluded from scheduling. If you want to include the currently running ones as well, use the drain command. Let's actually try it.
pi@raspi001:~/tmp $ k get nodes
NAME STATUS ROLES AGE VERSION
raspi001 Ready master 33d v1.14.1
raspi002 Ready worker 33d v1.14.1
raspi003 Ready worker 32d v1.14.1
pi@raspi001:~/tmp $ k cordon raspi002
pi@raspi001:~/tmp $ k get nodes
NAME STATUS ROLES AGE VERSION
raspi001 Ready master 33d v1.14.1
raspi002 Ready,SchedulingDisabled worker 33d v1.14.1
raspi003 Ready worker 32d v1.14.1
pi@raspi001:~/tmp $ k uncordon raspi002
pi@raspi001:~/tmp $ k get nodes
NAME STATUS ROLES AGE VERSION
raspi001 Ready master 33d v1.14.1
raspi002 Ready worker 33d v1.14.1
raspi003 Ready worker 32d v1.14.1
pi@raspi001:~/tmp $ k drain raspi002
node/raspi002 cordoned
error: unable to drain node "raspi002", aborting command...
There are pending nodes to be drained:
raspi002
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/kube-flannel-ds-arm-7nnbj, kube-system/kube-proxy-wgjdq, metallb-system/speaker-tsxdk
When you drain, it's fine if it's a Pod managed like a ReplicaSet, as it will be created on another Node, but unmanaged ones like standalone Pods will be deleted. The above warning is, "Do you mind if Pods managed by DaemonSet have to be deleted?" Therefore, when you drain, you will receive several warnings. You need to operate appropriately according to the warning content.
Cleaning up
pi@raspi001:~/tmp $ k delete -f sample-healthcheck.yaml -f sample-restart-always.yaml -f sample-restart-onfailure.yaml -f sample-initcontainer.yaml -f sample-lifecycle.yaml
Finally
This time, we learned about the operation of health checks and the steps to stop containers. I was surprised that there is a function in kubernetes that does not require us to prepare a health check process in the application. Next time is here.
Share
Related tags
- Cloud Native Days Tokyo 2019 - Participation Report for July 22-23, 2019
- Starting to Learn Kubernetes a Step Behind - 16. Components -
- Starting to Learn Kubernetes a Step Behind - 15. Security -
- Starting to Learn Kubernetes a Step Behind - 14. Scheduling -
- Osaka, Umeda - Participation Report for Kubernetes Meetup Tokyo 19 Osaka Satellite - May 31, 2019
- Starting to Learn Kubernetes a Step Behind - 12. Resource Limits -
- Starting to Learn Kubernetes a Step Behind - 11. config&storage Part 2 -
- Starting to Learn Kubernetes a Step Behind - 10. config&storage Part 1 -
- Osaka BMXUG Study Meeting -Kubernates Experience & Watson Discovery Introduction- Participation Report on March 27, 2019
- Starting to Learn Kubernetes a Step Behind - 09. discovery&LB Part 2 -
- Starting to Learn Kubernetes a Step Behind - 08. discovery&LB Part 1 -
- Starting to Learn Kubernetes a Step Behind - 07. Workloads Part 3 -
- Starting to Learn Kubernetes a Step Behind - 06. Workloads Part 2 -
- Starting to Learn Kubernetes a Step Behind - 05. workloads Part 1 -
- Starting to Learn Kubernetes a Step Behind - 04. kubectl -
- Starting to Learn Kubernetes a Step Behind - 03. Raspberry Pi -
- Starting to Learn Kubernetes a Step Behind - 02. Docker For Mac -
- Starting to Learn Kubernetes a Step Behind - 01. Choosing the Environment -