👨🏻‍🎓 #⃣ 🐻 Local files when porting an application to Kubernetes 🎅🏿 🗡️ 💂

When building a CI / CD process using Kubernetes, sometimes there is a problem of incompatibility of the requirements of the new infrastructure and the application transferred to it. In particular, at the stage of application assembly, it is important to get one image that will be used in all project environments and clusters. This principle underlies the correct container management in Google’s opinion (more than once our techdir spoke about this).

However, you will not see anyone in situations where a ready-made framework is used in the site code, the use of which imposes restrictions on its further operation. And if it’s easy to deal with in a “normal environment,” in Kubernetes this kind of behavior can be a problem, especially when you first encounter it. Although an ingenious mind is able to offer infrastructure solutions that seem obvious and even quite good at first glance ... it is important to remember that most situations can and should be solved architecturally .

Let's analyze the popular workaround-solutions for storing files, which can lead to unpleasant consequences during the operation of the cluster, and also point to a more correct path.

Static Storage

To illustrate, consider a web application that uses a static generator to get a set of pictures, styles, and more. For example, the Yii PHP framework has a built-in asset manager that generates unique directory names. Accordingly, the output is a set of obviously non-intersecting paths for site statics (this was done for several reasons - for example, to eliminate duplicates when using the same resource with many components). So, out of the box, when you first access the web resource module, statics are formed and laid out (in fact, often symlinks, but more on that later) with a common root directory that is unique for this deployment:

webroot/assets/2072c2df/css/…
webroot/assets/2072c2df/images/…
webroot/assets/2072c2df/js/…

What is this fraught with in terms of a cluster?

Simplest example

Let's take a fairly common case when PHP faces nginx to distribute statics and handle simple queries. The easiest way is Deployment with two containers:

 apiVersion: apps/v1 kind: Deployment metadata: name: site spec: selector: matchLabels: component: backend template: metadata: labels: component: backend spec: volumes: - name: nginx-config configMap: name: nginx-configmap containers: - name: php image: own-image-with-php-backend:v1.0 command: ["/usr/local/sbin/php-fpm","-F"] workingDir: /var/www - name: nginx image: nginx:1.16.0 command: ["/usr/sbin/nginx", "-g", "daemon off;"] volumeMounts: - name: nginx-config mountPath: /etc/nginx/conf.d/default.conf subPath: nginx.conf

In a simplified form, the nginx config boils down to the following:

 apiVersion: v1 kind: ConfigMap metadata: name: "nginx-configmap" data: nginx.conf: | server { listen 80; server_name _; charset utf-8; root /var/www; access_log /dev/stdout; error_log /dev/stderr; location / { index index.php; try_files $uri $uri/ /index.php?$args; } location ~ \.php$ { fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; include fastcgi_params; } }

When you first access the site in a container with PHP, assets appear. But in the case of two containers within the same pod, nginx knows nothing about these static files, which (according to the configuration) should be given to them. As a result, the client will see a 404 error for all requests to CSS and JS files. The simplest solution here is to organize a common directory for containers. A primitive option is the generic emptyDir

:

 apiVersion: apps/v1 kind: Deployment metadata: name: site spec: selector: matchLabels: component: backend template: metadata: labels: component: backend spec: volumes: - name: assets emptyDir: {} - name: nginx-config configMap: name: nginx-configmap containers: - name: php image: own-image-with-php-backend:v1.0 command: ["/usr/local/sbin/php-fpm","-F"] workingDir: /var/www volumeMounts: - name: assets mountPath: /var/www/assets - name: nginx image: nginx:1.16.0 command: ["/usr/sbin/nginx", "-g", "daemon off;"] volumeMounts: - name: assets mountPath: /var/www/assets - name: nginx-config mountPath: /etc/nginx/conf.d/default.conf subPath: nginx.conf

Now the static files generated in the container are given by nginx correctly. But let me remind you that this is a primitive solution, which means that it is far from ideal and has its own nuances and shortcomings, which are discussed below.

More advanced storage

Now imagine a situation when a user visited a site, loaded a page with the styles available in the container, and while he was reading this page, we re-deployed the container. The asset directory has become empty and requires a request to PHP to start generating new ones. However, even after this, links to old statics will be out of date, which will lead to errors in displaying statics.

In addition, we most likely have a more or less loaded project, which means that one copy of the application will not be enough:

Scale Deployment to two replicas.
When you first access the site in one replica, assets were created.
At some point, ingress decided (in order to balance the load) to send a request for a second replica, and these assets are not there yet. Or maybe they are no longer there, because we use RollingUpdate

and are currently doing a deploy.

In general, the result is errors again.

In order not to lose the old assets, you can change emptyDir

to hostPath

, adding the statics physically to the cluster node. This approach is bad in that we actually have to bind to a particular cluster node with our application, because - in case of moving to other nodes - the directory will not contain the necessary files. Or, some background synchronization of the directory between nodes is required.

What are the solutions?

If hardware and resources allow, you can use cephfs to organize an equally accessible directory for the needs of statics. Official documentation recommends SSDs, at least triple replication, and a robust “thick” connection between cluster nodes.
A less demanding option would be organizing an NFS server. However, then you need to consider the possible increase in the response time to processing requests by the web server, and the fault tolerance will leave much to be desired. The consequences of the failure are catastrophic: the loss of the mount dooms the cluster to death under the onslaught of the LA load rushing into the sky.

Among other things, for all options for creating persistent storage, background cleaning of outdated file sets accumulated over a certain period of time will be required. Before containers with PHP, you can put DaemonSet from caching nginx, which will store copies of assets for a limited time. This behavior is easily configured using proxy_cache

with storage depth in days or gigabytes of disk space.

Combining this method with the distributed file systems mentioned above provides a huge field for imagination, a limitation only in the budget and technical potential of those who will implement and support it. From experience, we say that the simpler the system, the more stable it works. When adding such layers, maintaining the infrastructure becomes much more difficult, and with it the time spent on diagnostics and recovery in case of any failures increases.

Recommendation

If the implementation of the proposed storage options also seems unjustified to you (complicated, expensive ...), then you should look at the situation from the other side. Namely, digging into the architecture of the project and eradicating the problem in the code by linking to some static data structure in the image, unambiguous definition of the contents or the procedure of “warming up” and / or precompiling assets at the stage of image assembly. So we get absolutely predictable behavior and the same set of files for all environments and replicas of the running application.

If we return to a specific example with the Yii framework and do not delve into its structure (which is not the purpose of the article), it suffices to point out two popular approaches:

Modify the process of assembling the image so that assets are placed in a predictable place. So offer / implement in extensions like yii2-static-assets .
Define specific hashes for asset directories, as described, for example, in this presentation (starting with slide 35). By the way, the author of the report ultimately (and not without reason!) Advises after assembling the assets on the build server to upload them to a central repository (like S3), before which you put the CDN.

Downloadable Files

Another case that will surely fire when transferring an application to a Kubernetes cluster is storing user files in the file system. For example, we again have a PHP application that accepts files via the upload form, does something with them in the process, and gives it back.

The place where these files should be placed in Kubernetes realities should be common to all application replicas. Depending on the complexity of the application and the need to organize the persistence of these files, such options may be the options for shared devices mentioned above, but, as we see, they have their drawbacks.

Recommendation

One solution is to use an S3-compatible storage (even if some kind of self-hosted category like minio). The transition to work with S3 will require changes at the code level , and we already wrote how the content will be returned on the frontend.

Custom sessions

Separately, it is worth noting the organization of storage of user sessions. Often these are also files on disk, which, in the context of Kubernetes, will lead to constant authorization requests from the user if his request falls into another container.

Part of the problem is solved by including stickySessions

on ingress (the feature is supported in all popular ingress controllers - see our review for more details) in order to bind the user to a specific pod with the application:

 apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: name: nginx-test annotations: nginx.ingress.kubernetes.io/affinity: "cookie" nginx.ingress.kubernetes.io/session-cookie-name: "route" nginx.ingress.kubernetes.io/session-cookie-expires: "172800" nginx.ingress.kubernetes.io/session-cookie-max-age: "172800" spec: rules: - host: stickyingress.example.com http: paths: - backend: serviceName: http-svc servicePort: 80 path: /

But this will not save you from repeated deployments.

Recommendation

A more correct way would be to transfer the application to the storage of sessions in memcached, Redis and similar solutions - in general, completely abandon file options.

Conclusion

The infrastructural solutions considered in the text are worthy of application only in the format of temporary “crutches” (which sounds more beautiful in English as a workaround). They may be relevant in the early stages of application migration to Kubernetes, but they should not be "rooted".

The general recommended way is to get rid of them in favor of architectural refinement of the application in accordance with the already well-known 12-Factor App . However, this - bringing the application to a stateless form - inevitably means that changes in the code are required, and here it is important to find a balance between the capabilities / requirements of the business and the prospects for implementing and maintaining the chosen path.

Local files when porting an application to Kubernetes

Static Storage

Simplest example

More advanced storage

Recommendation

Downloadable Files

Recommendation

Custom sessions

Recommendation

Conclusion

PS

More articles: