Cluster resource management is always a complex topic. How to explain the need to configure resources for the user who deploys their applications to the cluster? Maybe it’s easier to automate this?
Description of the problem
Resource management is an important task in the context of Kubernetes cluster administration. But why is it important if Kubernetes does all the hard work for you? Because it is not. Kubernetes provides you with convenient tools to solve many problems ... if you use these tools. For each pod in your cluster, you can specify the resources needed for its containers. And Kubernetes will use this information to distribute instances of your application across the cluster nodes.
Few people take resources management at Kubernetes seriously. This is normal for a lightly loaded cluster with a couple of static applications. But what if you have a very dynamic cluster? Where do applications come and go, where are the namespace created and deleted all the time? A cluster with a large number of users who can create their own namespace and deploy applications? Well, in this case, instead of a stable and predictable orchestration, you will have a bunch of random crashes in applications, and sometimes even in components of Kubernetes itself!
Here is an example of such a cluster:
You see 3 hearths in the “Terminating” state. But this is not the usual removal of hearths - they are stuck in this state because the containerd daemon on their node was affected by something very resource-hungry.
Such problems can be solved by properly handling the lack of resources, but this is not the topic of this article (there is a good article ), as well as not a silver bullet to solve all issues with resources.
The main reason for such problems is incorrect or lack of resource management in the cluster. And if this kind of problem is not a disaster for deployments, because they will easily create a new one working under, then for entities like DaemonSet, or even more for StatefulSet, such freezes will be fatal and require manual intervention.
You can have a huge cluster with lots of CPU and memory. When you run many applications on it without proper resource settings, there is a chance that all resource-intensive pods will be placed on the same node. They will fight for resources, even if the remaining nodes of the cluster remain practically free.
You can also often see less critical cases where some of the applications are affected by their neighbors. Even if the resources of these "innocent" applications were correctly configured, a wandering under can come and kill them. An example of such a scenario:
- Your application requests 4 GB of memory, but initially only takes 1 GB.
- A roaming sub, without a configuration of resources, is assigned to the same node.
- Wandering under consumes all available memory.
- Your application is trying to allocate more memory and crashes because there is no more.
Another fairly popular case is revaluation. Some developers make huge requests in manifests “just in case” and never use these resources. The result is a waste of money.
Decision theory
Horror! True?
Fortunately, Kubernetes offers a way to impose some restrictions on pods by specifying default resource configurations as well as minimum and maximum values. This is implemented using the LimitRange object . LimitRange is a very convenient tool when you have a limited number of namespaces or full control over the process of creating them. Even without the proper configuration of resources, your applications will be limited in their use. "Innocent", properly configured pods will be safe and protected from harmful neighbors. If someone deploys a greedy application without resource configuration, this application will receive default values and will probably crash. And it's all! The application will no longer drag anyone along.
Thus, we have a tool to control and force configuration of resources for hearths, now it seems we are safe. So? Not really. The fact is that, as we described earlier, our namespaces can be created by users, and therefore, LimitRange may not be present in such namespaces, since it must be created in each namespace separately. Therefore, we need something not only at the namespace level, but also at the cluster level. But there is no such function in Kubernetes yet.
That is why I decided to write my solution to this problem. Let me introduce you - Limit Operator. This is an operator created on the basis of the Operator SDK framework, which uses the ClusterLimit custom resource and helps to secure all "innocent" applications in the cluster. Using this operator, you can control the default values and resource limits for all namespaces using the minimum amount of configuration. It also allows you to choose exactly where to apply the configuration using namespaceSelector.
apiVersion: limit.myafq.com/v1alpha1 kind: ClusterLimit metadata: name: default-limit spec: namespaceSelector: matchLabels: limit: "limited" limitRange: limits: - type: Container max: cpu: "800m" memory: "1Gi" min: cpu: "100m" memory: "99Mi" default: cpu: "700m" memory: "900Mi" defaultRequest: cpu: "110m" memory: "111Mi" - type: Pod max: cpu: "2" memory: "2Gi"
With this configuration, the operator will create a LimitRange only in the namespace with the label limit: limited
. This will be useful to provide more stringent restrictions in a specific group of namespaces. If namespaceSelector is not specified, the operator will apply a LimitRange to all namespaces. If you want to configure LimitRange manually for a specific namespace, you can use the annotation "limit.myafq.com/unlimited": true
this will tell the operator to skip this namespace and not create LimitRange automatically.
Example script for using the operator:
- Create default ClusterLimit with liberal restrictions and without namespaceSelector - it will be applied everywhere.
- For a set of namespaces with lightweight applications, create an additional, more rigorous, ClusterLimit with namespaceSelector. Put labels on these namespaces accordingly.
- On a namespace with very resource-intensive applications, put the annotation "limit.myafq.com/unlimited": true and configure LimitRange manually with much wider limits than specified in the default ClusteLimit.
The important thing to know about several LimitRange in one namespace:
When a sub is created in a namespace with several LimitRange, the largest defaults will be taken to configure its resources. But the maximum and minimum values will be checked according to the strictest of LimitRange.
Practical example
The operator will track all changes in all namespace, ClusterLimits, child LimitRanges and will initiate coordination of the state of the cluster with any change in the monitored objects. Let's see how it works in practice.
To start, create under without any restrictions:
❯() kubectl run --generator=run-pod/v1 --image=bash bash pod/bash created ❯() kubectl get pod bash -o yaml apiVersion: v1 kind: Pod metadata: labels: run: bash name: bash namespace: default spec: containers: - image: bash name: bash resources: {}
Note: part of the command output has been omitted to simplify the example.
As you can see, the "resources" field is empty, which means this sub can be launched anywhere.
Now we will create the default ClusterLimit for the entire cluster with fairly liberal values:
apiVersion: limit.myafq.com/v1alpha1 kind: ClusterLimit metadata: name: default-limit spec: limitRange: limits: - type: Container max: cpu: "4" memory: "5Gi" default: cpu: "700m" memory: "900Mi" defaultRequest: cpu: "500m" memory: "512Mi"
And also more stringent for a subset of namespaces:
apiVersion: limit.myafq.com/v1alpha1 kind: ClusterLimit metadata: name: restrictive-limit spec: namespaceSelector: matchLabels: limit: "restrictive" limitRange: limits: - type: Container max: cpu: "800m" memory: "1Gi" default: cpu: "100m" memory: "128Mi" defaultRequest: cpu: "50m" memory: "64Mi" - type: Pod max: cpu: "2" memory: "2Gi"
Then create the namespaces and pods in them to see how it works.
Normal namespace with default restriction:
apiVersion: v1 kind: Namespace metadata: name: regular
And a slightly more limited namespace, according to legend - for lightweight applications:
apiVersion: v1 kind: Namespace metadata: labels: limit: "restrictive" name: lightweight
If you look at the operator’s logs immediately after creating the namespace, you can find something like that under the spoiler:
{...,"msg":"Reconciling ClusterLimit","Triggered by":"/regular"} {...,"msg":"Creating new namespace LimitRange.","Namespace":"regular","LimitRange":"default-limit"} {...,"msg":"Updating namespace LimitRange.","Namespace":"regular","Name":"default-limit"} {...,"msg":"Reconciling ClusterLimit","Triggered by":"/lightweight"} {...,"msg":"Creating new namespace LimitRange.","Namespace":"lightweight","LimitRange":"default-limit"} {...,"msg":"Updating namespace LimitRange.","Namespace":"lightweight","Name":"default-limit"} {...,"msg":"Creating new namespace LimitRange.","Namespace":"lightweight","LimitRange":"restrictive-limit"} {...,"msg":"Updating namespace LimitRange.","Namespace":"lightweight","Name":"restrictive-limit"}
The missing part of the log contains 3 more fields that are not relevant at the moment
As you can see, the creation of each namespace started the creation of new LimitRange. A more limited namespace got two LimitRange - default and more strict.
Now let's try to create a pair of hearths in these namespaces.
❯() kubectl run --generator=run-pod/v1 --image=bash bash -n regular pod/bash created ❯() kubectl get pod bash -o yaml -n regular apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu, memory request for container bash; cpu, memory limit for container bash' labels: run: bash name: bash namespace: regular spec: containers: - image: bash name: bash resources: limits: cpu: 700m memory: 900Mi requests: cpu: 500m memory: 512Mi
As you can see, despite the fact that we have not changed the way the pod is created, the resource field is now filled. You might also notice the annotation automatically created by LimitRanger.
Now create under in a lightweight namespace:
❯() kubectl run --generator=run-pod/v1 --image=bash bash -n lightweight pod/bash created ❯() kubectl get pods -n lightweight bash -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu, memory request for container bash; cpu, memory limit for container bash' labels: run: bash name: bash namespace: lightweight spec: containers: - image: bash name: bash resources: limits: cpu: 700m memory: 900Mi requests: cpu: 500m memory: 512Mi
Please note that the resources in the hearth are the same as in the previous example. This is because in the case of several LimitRange, less strict default values will be used when creating pods. But why then do we need a more limited LimitRange? It will be used to check the maximum and minimum values of resources. To demonstrate, we will make our limited ClusterLimit even more limited:
apiVersion: limit.myafq.com/v1alpha1 kind: ClusterLimit metadata: name: restrictive-limit spec: namespaceSelector: matchLabels: limit: "restrictive" limitRange: limits: - type: Container max: cpu: "200m" memory: "250Mi" default: cpu: "100m" memory: "128Mi" defaultRequest: cpu: "50m" memory: "64Mi" - type: Pod max: cpu: "2" memory: "2Gi"
Pay attention to the section:
- type: Container max: cpu: "200m" memory: "250Mi"
Now we have installed 200m CPU and 250Mi of memory as a maximum for the container in the hearth. And now again, try to create under:
❯() kubectl run --generator=run-pod/v1 --image=bash bash -n lightweight Error from server (Forbidden): pods "bash" is forbidden: [maximum cpu usage per Container is 200m, but limit is 700m., maximum memory usage per Container is 250Mi, but limit is 900Mi.]
Our sub has large values set by the default LimitRange and it could not start because it did not pass the maximum allowed resources check.
This was an example of using the Limit Operator. Try it yourself and play with ClusterLimit in your local Kubernetes instance.
In the GitHub Limit Operator repository you can find the manifest for the deployment of the operator, as well as the source code. If you want to expand the operator’s functionality, pull-quests and feature-quests are welcome!
Conclusion
Resource management at Kubernetes is critical to the stability and reliability of your applications. Customize your hearth resources whenever possible. And use LimitRange to insure against cases when it is not possible. Automate the creation of LimitRange using the Limit Operator.
Follow these tips, and your cluster will always be safe from resourceless chaos of stray hearths.