Starting to Learn Kubernetes a Step Behind - 07. Workloads Part 3 -

Story

Last time

In Starting to learn Kubernetes a step behind - 06. workloads part 2 -, we learned about DaemonSet and StatefulSet (partially). This time, we will continue with StatefulSet and learn about Job and CronJob.

StatefulSet

# sample-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sample-statefulset
spec:
  serviceName: sample-statefulset
  replicas: 3
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
        - name: nginx-container
          image: nginx:1.12
          ports:
            - containerPort: 80
          volumeMounts:
            - name: www
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
    - metadata:
        name: www
      spec:
        accessModes:
          - ReadWriteMany
        storageClassName: managed-nfs-storage
        resources:
          requests:
            storage: 1Gi

We will check whether the data is saved persistently.

pi@raspi001:~/tmp $ k apply -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k exec -it sample-statefulset-0 -- df -h
Filesystem  Size  Used Avail Use% Mounted on
...
192.168.3.35:/home/data/default-www-sample-statefulset-0-pvc-*   15G  1.1G   13G   8% /usr/share/nginx/html
...
pi@raspi001:~/tmp $ k exec -it sample-statefulset-0 touch /usr/share/nginx/html/sample.html

We created a file named sample.html. We will check if this disappears.

pi@raspi001:~/tmp $ k delete pod sample-statefulset-0
pi@raspi001:~/tmp $ k exec -it sample-statefulset-0 ls /usr/share/nginx/html/sample.html
/usr/share/nginx/html/sample.html

After deleting the pod and reviving it with self-healing, we check and find that sample.html remains.

pi@raspi001:~/tmp $ k delete -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k apply -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k exec -it sample-statefulset-0 ls /usr/share/nginx/html/sample.html
/usr/share/nginx/html/sample.html

This also remains. OK.

Scaling

In StatefulSet, when scaling out, the index increases from the smaller ones. Conversely, when scaling in, the larger indexes are deleted first. Also, it increases and decreases one by one. Therefore, the first Pod created will be the last one to be deleted. Let's try it.

pi@raspi001:~ $ k get pod | grep sample-statefulset
sample-statefulset-0                      1/1     Running   1          10h
sample-statefulset-1                      1/1     Running   1          10h
sample-statefulset-2                      1/1     Running   1          10h
pi@raspi001:~/tmp $ vim sample-statefulset.yaml # replica:3→4
pi@raspi001:~/tmp $ k apply -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k get pod | grep sample-statefulset
sample-statefulset-0                      1/1     Running             1          10h
sample-statefulset-1                      1/1     Running             1          10h
sample-statefulset-2                      1/1     Running             1          10h
sample-statefulset-3                      0/1     ContainerCreating   0          6s
pi@raspi001:~/tmp $ vim sample-statefulset.yaml # replica:4→2
pi@raspi001:~/tmp $ k apply -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k get pod | grep sample-statefulset
sample-statefulset-0                      1/1     Running       1          10h
sample-statefulset-1                      1/1     Running       1          10h
sample-statefulset-2                      1/1     Running       1          10h
sample-statefulset-3                      0/1     Terminating   0          2m4s
pi@raspi001:~/tmp $ k get pod | grep sample-statefulset
sample-statefulset-0                      1/1     Running       1          10h
sample-statefulset-1                      1/1     Running       1          10h
sample-statefulset-2                      0/1     Terminating   0          10h

As expected. If you want to create not one by one, but in parallel, you can achieve this by setting spec.podManagementPolicy to parallel.

Update strategy

There are two strategies, OnDelete and RollingUpdate. The former is updated at the time of deletion (not manifest update, but delete), and the latter is updated immediately. In the update of StatefulSet, you cannot adjust the excess or deficiency during the update (maxUnavailable, maxSurge) at all. Also, by the field of partition, you can adjust which index to update from. This is a feature unique to stateful. We didn't try it in Deployment, but I think we'll try it here.

The default strategy is RollingUpdate. Since this has been confirmed to work many times, I think I'll try OnDelete. (Leave partition)

# sample-statefulset.yaml
---
spec:
  updateStrategy:
    type: OnDelete
---
template:
  spec:
    containers:
      - name: nginx-container
        image: nginx:1.13

We set the update strategy to OnDelete and updated the nginx image from 1.12 to 1.13.

pi@raspi001:~/tmp $ k delete -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k apply -f sample-statefulset.yaml
pi@raspi001:~/tmp $ k describe pod sample-statefulset-0 | grep "Image:"
    Image:          nginx:1.12
pi@raspi001:~/tmp $ k delete pod sample-statefulset-0
pi@raspi001:~/tmp $ k get pod | grep sample-statefulset
sample-statefulset-0                      0/1     ContainerCreating   0          5s
sample-statefulset-1                      1/1     Running             0          2m59s
pi@raspi001:~/tmp $ k describe pod sample-statefulset-0 | grep "Image:"
    Image:          nginx:1.13

As expected. If you explicitly delete the pod, nginx has been updated.

Job

A resource that allows you to run a one-time process. It can replicate like a replicaSet. Suitable for batch processing.

Let's run a job that only sleeps for 10 seconds.

# sample-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: sample-job
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 10
  template:
    spec:
      containers:
        - name: sleep-container
          image: nginx:1.12
          command: ["sleep"]
          args: ["10"]
      restartPolicy: Never

pi@raspi001:~/tmp $ k apply -f sample-job.yaml
pi@raspi001:~/tmp $ k get pod
NAME                                      READY   STATUS      RESTARTS   AGE
sample-job-d7465                          0/1     Completed   0          3m17s
pi@raspi001:~/tmp $ k get job
NAME         COMPLETIONS   DURATION   AGE
sample-job   1/1           27s        4m8s

When the job execution is finished, the pod has disappeared. And since the COMPLETIONS of the job is 1/1, it seems to have finished normally. Conversely, if it does not finish normally, it will be re-executed according to the restartPolicy. The types are Never and OnFailure. Never creates a new Pod and re-executes, and OnFailure re-executes using the existing Pod. However, please note that the data itself will be lost.

completions is the target success number, parallelism is the number of parallels, and backoffLimit is the failure tolerance. You should set it according to your purpose. Also, if you do not specify completions, the job will continue to run until you stop it. If you do not specify backoffLimit, it will be up to 6 times.

Hmm, nothing particularly interesting, and that's it. Lol

CronJob

A resource that runs Jobs at scheduled times. It's similar to the relationship between Deployment and ReplicaSet, with Cronjob managing the job.

Let's prepare a job that has a 50% chance of success every minute and try it out.

# sample-cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: sample-cronjob
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Allow
  startingDeadlineSeconds: 30
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 3
  suspend: false
  jobTemplate:
    spec:
      completions: 1
      parallelism: 1
      backoffLimit: 1
      template:
        spec:
          containers:
            - name: sleep-container
              image: nginx:1.12
              command:
                - "sh"
                - "-c"
              args:
                # 約50%の確率で成功するコマンド
                - "sleep 40; date +'%N' | cut -c 9 | egrep '[1|3|5|7|9]'"
          restartPolicy: Never

pi@raspi001:~/tmp $ k apply -f sample-cronjob.yaml
pi@raspi001:~/tmp $ k get all
NAME                           SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/sample-cronjob   */1 * * * *   False     0        <none>          9s

It seems that jobs and pods are not created until the time comes. I waited for a few minutes.

pi@raspi001:~/tmp $ k get all
NAME                                          READY   STATUS      RESTARTS   AGE
pod/sample-cronjob-1557115320-dsdvg           0/1     Error       0          2m18s
pod/sample-cronjob-1557115320-qkgtp           0/1     Completed   0          87s
pod/sample-cronjob-1557115380-r57sw           0/1     Completed   0          78s
pod/sample-cronjob-1557115440-2phzb           1/1     Running     0          17s

NAME                                  COMPLETIONS   DURATION   AGE
job.batch/sample-cronjob-1557115320   1/1           105s       2m18s
job.batch/sample-cronjob-1557115380   1/1           52s        78s
job.batch/sample-cronjob-1557115440   0/1           17s        17s

NAME                           SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/sample-cronjob   */1 * * * *   False     1        20s             3m12s

There's a naming rule for the names, so it's clear how they're related. The reason why the Pods are left is due to the values of failedJobsHistoryLimit and successfulJobsHistoryLimit. It seems that they are left for checking with logs, but it is also said that it is better to aggregate them into a log collection infrastructure.

If you want to stop in the middle, you can do so by setting spec.suspend to true. There is a concurrencyPolicy for limiting simultaneous execution, with options of Allow, Forbid, and Replace. Allow means no particular restrictions. Forbid means not to execute until the previous job is finished. Replace means to delete the previous job and execute the job.

You specify how much delay you can tolerate with startingDeadlineSeconds.

This also ended without any particular incident. Lol

Cleaning up

pi@raspi001:~/tmp $ k delete -f sample-statefulset.yaml -f sample-job.yaml -f sample-cronjob.yaml
pi@raspi001:~/tmp $ k delete pvc www-sample-statefulset-{0,1,2,3}

In conclusion

Finally, the workloads are over. I felt like I rushed through the end. Next time is here.

If it was helpful, support me with a ☕!

Related tags

Kubernetes