What is the beauty of dividing the runtime of containers into separate instrumental components? In particular, the fact that these tools can begin to be combined to protect each other.
Many are attracted by the idea of building container
OCIs within
Kubernetes or a similar system. Suppose we have a CI / CD that constantly collects images, then something like
Red Hat OpenShift / Kubernetes would be very useful in terms of load balancing during assembly. Until recently, most people simply gave containers access to the docker socket and were allowed to execute the docker build command.
We showed several years ago that this is very unsafe, in fact, it is even worse than giving a passwordless root or sudo.
Therefore, people are constantly trying to run Buildah in a container. In short, we created
an example of how, in our opinion, it is best to run Buildah inside the container, and put the appropriate images on
quay.io/buildah . Let's get started ...
Customization
These images are compiled from Dockerfiles, which can be found in the Buildah repository in the
buildahimage folder.
Here we look at the
stable version of Dockerfile .
# stable/Dockerfile # # Build a Buildah container image from the latest # stable version of Buildah on the Fedoras Updates System. # https://bodhi.fedoraproject.org/updates/?search=buildah # This image can be used to create a secured container # that runs safely with privileges within the container. # FROM fedora:latest # Don't include container-selinux and remove # directories used by dnf that are just taking # up space. RUN yum -y install buildah fuse-overlayfs --exclude container-selinux; rm -rf /var/cache /var/log/dnf* /var/log/yum.* # Adjust storage.conf to enable Fuse storage. RUN sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' /etc/containers/storage.conf
Instead of OverlayFS, implemented at the Linux kernel level of the host, we use the
fuse-overlay program inside the container, because at the moment OverlayFS can mount only if it is granted SYS_ADMIN privileges by Linux capabilities. And we want to run our Buildah containers without any root privileges. Fuse-overlay works pretty fast and better in performance than the VFS storage driver. Note that when starting a Buildah container using Fuse, you must provide the device / dev / fuse.
podman run --device /dev/fuse quay.io/buildahctr ... RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock
Next, we create a directory for additional storage.
Container / storage supports the concept of connecting additional read-only image storages. For example, you can configure overlay storage area on one machine, and then use NFS to mount this storage on another machine and use images from it without downloading via pull. We need this storage in order to be able to connect some kind of image storage from the host as a volume and use it inside the container.
# Set up environment variables to note that this is # not starting with user namespace and default to # isolate the filesystem with chroot. ENV _BUILDAH_STARTED_IN_USERNS="" BUILDAH_ISOLATION=chroot
Finally, using the BUILDAH_ISOLATION environment variable, we say that by default, the Buildah container should start with chroot isolation. Additional isolation is not required here, since we already work in the container. In order for Buildah to create its own containers with separation of name spaces, the SYS_ADMIN privilege is required, and for this it will be necessary to weaken the SELinux and SECCOMP rules for the container, which contradicts our installation to build from a secure container.
Run Buildah inside the container
The Buildah container image schema discussed above allows you to flexibly vary the way you run such containers.
Speed vs. Security
Computer security is always a compromise between the speed of a process and how much protection is wound around it. This statement is also true when assembling containers, so below we will consider options for such a compromise.
The container image discussed above will keep its storage in / var / lib / containers. Therefore, we need to mount the content in this folder, and the way we do this will greatly affect the assembly speed of container images.
Let's consider three options.
Option 1. If maximum security is required, then for each container you can create your own folder for containers / image and connect it to the container via volume-mount. And besides, place the context directory in the container itself, in the / build folder:
# mkdir /var/lib/containers1 # podman run -v ./build:/build:z -v /var/lib/containers1:/var/lib/containers:Z quay.io/buildah/stable\ buildah -t image1 bud /build # podman run -v /var/lib/containers1:/var/lib/containers:Z quay.io/buildah/stable buildah push \ image1 registry.company.com/myuser # rm -rf /var/lib/containers1
Security. Buildah running in such a container has maximum security: it is not given any root privileges with capabilities tools, and all SECOMP and SELinux restrictions apply to it. Such a container can even be run with User Namespace isolation by adding an option like --uidmap 0: 100000: 10000 .
Performance. But the performance here is minimal, since any images from container registries are copied to the host every time, and caching does not work from the word “no way”. When completing its work, the Buildah container should send the image to the registry and destroy the content on the host. When the container image is collected next time, it will have to be downloaded from the registry again, because by that time nothing will remain on the host.
Option 2. If you need Docker-level performance, you can mount the container / storage of the host directly into the container.
# podman run -v ./build:/build:z -v /var/lib/containers:/var/lib/containers --security-opt label:disabled quay.io/buildah/stable buildah -t image2 bud /build # podman run -v /var/lib/containers:/var/lib/containers --security-opt label:disabled \ quay.io/buildah/stable buildah push image2 registry.company.com/myuser
Security. This is the least secure way to build containers, because here the container is allowed to modify the storage on the host, and potentially it can slip into Podman or CRI-O a malicious image. In addition, you will need to disable SELinux separation so that processes in the Buildah container can interact with the storage on the host. Please note that this option is still better than the Docker socket, since the container is blocked by the remaining security functions and cannot just pick up and run any container on the host.
Performance. Here it is maximum, since caching is fully involved. If Podman or CRI-O already managed to download the desired image to the host, then the Buildah process inside the container will not have to download it again, and subsequent assemblies based on this image will also be able to take the necessary from the cache.
Option 3. The essence of this method is to combine several images in one project with a shared folder for container images.
# mkdir /var/lib/project3 # podman run --security-opt label:level=s0:C100, C200 -v ./build:/build:z \ -v /var/lib/project3:/var/lib/containers:Z quay.io/buildah/stable buildah -t image3 bud /build # podman run --security-opt label:level=s0:C100, C200 \ -v /var/lib/project3:/var/lib/containers quay.io/buildah/stable buildah push image3 \ registry.company.com/myuser
In this example, we do not delete the project folder (/ var / lib / project3) between starts, so all subsequent builds within the project take advantage of caching.
Security. Something between options 1 and 2. On the one hand, containers do not have access to content on the host and, accordingly, cannot slip something bad into the Podman / CRI-O image storage. On the other hand, as part of its project, a container may interfere with the assembly of other containers.
Performance. Here it is worse than when using a shared cache at the host level, since you can not use images already downloaded previously using Podman / CRI-O. However, after Buildah downloads the image, this image can be used in any subsequent builds within the project.
Additional storage
Containers / storage has such a cool thing as additional stores, thanks to which container engines can use external image stores in read-only overlay mode when launching and building containers. In fact, you can add one or more read-only storages to the storage.conf file so that when the container starts, the container engine will look for the desired image in them. Moreover, he will download the image from the registry only if he does not find it in any of these repositories. The container engine will only be able to write to writable storage ...
If you scroll up and see the Dockerfile, which we use to build the image quay.io/buildah/stable, then there are such lines:
# Adjust storage.conf to enable Fuse storage. RUN sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' /etc/containers/storage.conf RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock
In the first line, we modify /etc/containers/storage.conf inside the container image, telling the storage driver to use “additionalimagestores” in the / var / lib / shared folder. And in the next line, create a shared folder and add a couple of lock files so that there is no abuse from containers / storage. Basically, we just create an empty container image storage.
If you mount containers / storage above this folder, Buildah will be able to use images.
Now back to Option 2 discussed above, when the Buildah container can read and write to containers / store on hosts and, accordingly, has maximum performance due to image caching at the Podman / CRI-O level, but gives a minimum of security, since it can write directly in storage. And now we will fasten additional storage here and get the best of two worlds.
# mkdir /var/lib/containers4 # podman run -v ./build:/build:z -v /var/lib/containers/storage:/var/lib/shared:ro -v \ /var/lib/containers4:/var/lib/containers:Z quay.io/buildah/stable \ buildah -t image4 bud /build # podman run -v /var/lib/containers/storage:/var/lib/shared:ro \ -v >/var/lib/containers4:/var/lib/containers:Z quay.io/buildah/stable buildah push image4 \ registry.company.com/myuser # rm -rf /var/lib/continers4
Note that the / var / lib / containers / storage host is mounted in / var / lib / shared inside the container in read-only mode. Therefore, working in a container, Buildah can use any images that were previously downloaded using Podman / CRI-O (hi, speed), but can only write to its own repository (hi, security). Also note that this is done without disabling SELinux separation for the container.
Important nuance
In no case should you remove any images from the underlying storage. Otherwise, the Buildah container may fly out.
And this is not all the benefits.
The capabilities of additional storage are not limited to the above scenario. For example, you can place all container images in a shared network storage and give access to all Buildah containers. Suppose we have hundreds of images that our CI / CD system regularly uses to build container images. We concentrate all these images on one storage host and then, using the preferred network storage tools (NFS, Gluster, Ceph, ISCSI, S3 ...), we open the shared access to this storage to all Buildah or Kubernetes nodes.
Now it’s enough to mount this network storage into the Buildah container on / var / lib / shared and that’s it - Buildah containers no longer have to download images via pull. Thus we throw out the pre-population phase and are immediately ready to roll out the containers.
And of course, this can be used within the existing Kubernetes system or container infrastructure to launch and execute containers anywhere without downloading images via pull. Moreover, the container registry, receiving a push request to load an updated image into it, can automatically send this image to the shared network storage, where it instantly becomes available to all nodes.
The size of container images can sometimes reach many gigabytes. The functionality of additional storages allows you to do without cloning such images by nodes and makes launching containers almost instant.
In addition, we are currently working on a new overlay volume mounts feature that will make container assembly even faster.
Conclusion
Running a Buildah inside a container in Kubernetes / CRI-O, Podman, or even Docker is real, and it's simpler and much safer than using docker.socket. We have greatly improved the flexibility of working with images, and now you can run them in various ways for the optimal balance between security and performance.
The functionality of additional storages allows you to speed up or even completely eliminate the downloading of images to nodes.