Starting to Learn Kubernetes a Step Behind - 12. Resource Limits -

Story

Last time

In Starting to Learn Kubernetes a Step Behind - 11. config&storage Part 2 -, we learned about storage. This time, we will learn about resource limits.

※ According to the Kubernetes Complete Guide, instead of explaining "Metadata" directly from the type of resources, it was explained based on the content, so I will follow that.

Resource Limits

In Kubernetes, you can impose resource limits on the containers you manage. You can mainly limit resources such as CPU and memory, but you can also limit GPU by using Device Plugins.

※ The way to specify CPU is in units of 1vCPU = 1000millicores(m).

Requests and Limits

A request is the minimum value of the resources to be used. Limits are the maximum value of the resources to be used.

If there are no resources specified on the free node, the request will not be scheduled, but the limits will be scheduled regardless.

Anyway, let's try it out.

First, let's check the current situation.

pi@raspi001:~/tmp $ k get node
NAME       STATUS   ROLES    AGE   VERSION
raspi001   Ready    master   31d   v1.14.1
raspi002   Ready    worker   31d   v1.14.1
raspi003   Ready    worker   30d   v1.14.1
pi@raspi001:~/tmp $ k get nodes -o jsonpath='{.items[?(@.metadata.name!="raspi001")].status.allocatable.memory}'
847048Ki 847048Ki
pi@raspi001:~/tmp $ k get nodes -o jsonpath='{.items[?(@.metadata.name!="raspi001")].status.allocatable.cpu}'
4 4
pi@raspi001:~/tmp $ k get nodes -o jsonpath='{.items[?(@.metadata.name!="raspi001")].status.capacity.memory}'
949448Ki 949448Ki
pi@raspi001:~/tmp $ k get nodes -o jsonpath='{.items[?(@.metadata.name!="raspi001")].status.capacity.cpu}'
4 4

The usage of jsonpath is here.

Allocatable is the amount of resources that can be allocated to a Pod, and capacity is the amount of resources that can be allocated on the entire Node. This alone does not tell us how much resources are currently being used, so we will check individually.

pi@raspi001:~/tmp $ k describe node raspi002
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                200m (5%)    200m (5%)
  memory             150Mi (18%)  150Mi (18%)
  ephemeral-storage  0 (0%)       0 (0%)
...
pi@raspi001:~/tmp $ k describe node raspi003
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                400m (10%)   300m (7%)
  memory             320Mi (38%)  420Mi (50%)
  ephemeral-storage  0 (0%)       0 (0%)
...

The current resource situation is as follows when put into a table.

node	allocatable (memory/cpu)	capacity (memory/cpu)	used (memory/cpu)	remain (memory/cpu)
raspi002	847,048Ki/4000m	949,448Ki/4000m	150,000Ki/200m	697,048Ki/3800m
raspi003	847,048Ki/4000m	949,448Ki/4000m	320,000Ki/400m	527,048Ki/3600m

Now, let's try resource limits.

# sample-resource.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-resource
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
        - name: nginx-container
          image: nginx:1.12
          resources:
            requests:
              memory: "128Mi"
              cpu: "300m"
            limits:
              memory: "256Mi"
              cpu: "600m"

The total minimum memory requested by the pod to be applied is 384Mi(128Mi×3), and the cpu is 900m(300m×3). With this, the pod should run.

pi@raspi001:~/tmp $ k apply -f sample-resource.yaml
pi@raspi001:~/tmp $ k get pods
NAME                                      READY   STATUS    RESTARTS   AGE
sample-resource-785cd54844-7n89t          1/1     Running   0          108s
sample-resource-785cd54844-9b5f9          1/1     Running   0          108s
sample-resource-785cd54844-whj7x          1/1     Running   0          108s

As expected. Now let's try a situation where resource limits are reached.

The total minimum memory of all WorkerNodes is 1,224Mi(697,048Ki+527,048Ki). We will update the previous manifest to exceed this. We set the number of replicas to 3, but it should be 10 (1,280Mi)

As expected, 9 (128Mi*9=1,152Mi) should be Running and 1 (128Mi) should be Pending.

After changing the replica of sample-resource.yaml to 10 ↓

pi@raspi001:~/tmp $ k apply -f sample-resource.yaml
pi@raspi001:~/tmp $ k get pods
NAME                                      READY   STATUS    RESTARTS   AGE
sample-resource-785cd54844-7n89t          1/1     Running   0          6m19s
sample-resource-785cd54844-9b5f9          1/1     Running   0          6m19s
sample-resource-785cd54844-dffsd          1/1     Running   0          61s
sample-resource-785cd54844-jmkv6          1/1     Running   0          61s
sample-resource-785cd54844-k9vcb          1/1     Running   0          61s
sample-resource-785cd54844-l4smf          0/1     Pending   0          60s
sample-resource-785cd54844-n4hl7          1/1     Running   0          60s
sample-resource-785cd54844-th4bp          0/1     Pending   0          60s
sample-resource-785cd54844-whj7x          1/1     Running   0          6m19s
sample-resource-785cd54844-xclsk          1/1     Running   0          60s

Oh, two are Pending. Maybe it's because there are no spare resources on the Node. Let's check.

pi@raspi001:~/tmp $ k describe node raspi002
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                1700m (42%)  3200m (80%)
  memory             790Mi (95%)  1430Mi (172%)
  ephemeral-storage  0 (0%)       0 (0%)
...
pi@raspi001:~/tmp $ k describe node raspi003
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                1300m (32%)  2100m (52%)
  memory             704Mi (85%)  1188Mi (143%)
  ephemeral-storage  0 (0%)       0 (0%)
...

raspi002 is using 790Mi out of 847Mi. There are no resources to add one Pod (128Mi). raspi003 is using 704Mi out of 847Mi. It seems to be free here, but why?

Here, if you focus on the 704Mi (85%) of memory, it would be 828Mi if it was 100%. Indeed, 704Mi+128Mi=832Mi would be over.

So, what's the difference between the 847Mi that was displayed as allocatable? Allocatable refers to the amount of resources that can be allocated, including all pods in all namespaces. Not only default, but also pods in other namespaces such as kube-system are of course consuming resources. Isn't 828Mi the amount of resources that can be allocated for use in default? (The current namespace is default)

By the way, the Limits are exceeding 100%... Oh my...

Cluster Autoscaler

This is a feature that automatically adds Kubernetes Nodes as needed. It operates when a Pod becomes Pending. In other words, it scales according to the lower limit of requests.

Therefore, if the requests are too high, it may scale even if the actual load average is low, or if the requests are too low, it may not scale even under high load. Let's optimize the requests while performance testing.

LimitRange

As in the previous example, you can set requests and limits for each, but there is something more convenient. That's LimitRange. This allows you to set the minimum and maximum values of CPU and memory resources for a Namespace. The following are the limit items that can be set.

default
- Default Limits
defaultRequest
- Default Requests
max
- Maximum resources
min
- Minimum resources
maxLimitRequestRatio
- Ratio of Limits/Requests

Also, the targets to be limited are Container, Pod, PersistentVolumeClaim. When actually operating, let's define it properly. (It seems that some providers have it set by default)

ResourceQuota

By using ResourceQuota, you can limit the "number of resources that can be created" and "resource usage" for each Namespace. I'm going to try limiting the "number of resources that can be created".

# sample-resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: sample-resourcequota
  namespace: default
spec:
  hard:
    # 作成可能なリソースの数
    count/pods: 5

pi@raspi001:~/tmp $ k delete -f sample-resource.yaml
pi@raspi001:~/tmp $ k apply -f sample-resourcequota.yaml
pi@raspi001:~/tmp $ k apply -f sample-resource.yaml
pi@raspi001:~/tmp $ k get pods
NAME                                      READY   STATUS    RESTARTS   AGE
sample-resource-785cd54844-dll8t          1/1     Running   0          38s
sample-resource-785cd54844-ljr7q          1/1     Running   0          39s
sample-resource-785cd54844-r6txh          1/1     Running   0          38s
sample-resource-785cd54844-sb6sq          1/1     Running   0          38s
sample-resource-785cd54844-3ffeg          1/1     Running   0          38s

By doing this, only up to 5 pods can be created, so even if you apply sample-resource.yaml, only up to 5 will be created. In the case of replicas, it simply wasn't created without any warning. If you limit the configmap to 5 and apply one configmap at a time, a warning will appear.

HorizontalPodAutoscaler(HPA)

HPA is a resource that automatically scales according to the CPU load of Pods managed by Deployment, ReplicaSet. It checks whether to scale every 30 seconds.

The required number of replicas is represented by the following formula.

Required number of replicas = ceil(sum(current CPU usage of Pod)/targetAverageUtilization)

In About Auto Scaling of Pods and Nodes in Kubernetes,

auto scaling adjusts the number of pods to approach the target value.

This sentence was easy to understand. In other words, if the targetAverageUtilization is 50, the overall CPU usage is adjusted to be 50%. This time, I wanted to try it, but I couldn't verify it because I didn't install the metrics-server. I'll try to install it and try it again next time.

VerticalPodAutoscaler(VPA)

VPA is a resource that scales the resource allocation of CPU and memory assigned to containers. This is not a scale-out of HPA, but a scale-up of Pod.

Cleanup

pi@raspi001:~/tmp $ k delete -f sample-resource.yaml -f sample-resourcequota.yaml

Finally

This time, we tried to limit resources by operating Requests and Limits. We learned how to check how much resources each is consuming, and also learned how to use jsonpath. The next one is here.

If it was helpful, support me with a ☕!

Related tags

Kubernetes