[reference to the American children's fairy tale "The Little Engine That Could" - approx. per.] *
How to automagically create tiny docker images for your needs
For the past couple of months I have been obsessed with an obsession: how much can I reduce the Docker image so that the application works?
I understand the idea is strange.
Before delving into the details and technical wilds, I would like to explain why this problem has hooked me so much and how it concerns you.
By shortening the contents of the Docker image, we are shortening the list of vulnerabilities. In addition, we make images cleaner, because they contain only what you need to run applications.
There is another small advantage - the images download a little faster, but, as for me, this is not so important.
Please Note: If you care about size, Alpine looks are small in themselves and will probably suit you.
The Distroless project offers a selection of basic "distroless" images; they do not contain the package managers, shells, or other utilities that you are used to seeing on the command line. As a result, using package managers like pip
and apt
will fail:
FROM gcr.io/distroless/python3 RUN pip3 install numpy
Dockerfile using a Python 3 distroless image
Sending build context to Docker daemon 2.048kB Step 1/2 : FROM gcr.io/distroless/python3 ---> 556d570d5c53 Step 2/2 : RUN pip3 install numpy ---> Running in dbfe5623f125 /bin/sh: 1: pip3: not found
No pip in the image
Typically, this problem is solved by a multi-stage build:
FROM python:3 as builder RUN pip3 install numpy FROM gcr.io/distroless/python3 COPY --from=builder /usr/local/lib/python3.7/site-packages /usr/local/lib/python3.5/
Multi-stage assembly
The result is a 130MB image. Not so bad! For comparison: the Python image by default weighs 929MB, and the “thinner” ( 3,7-slim
) - 179MB, the alpine ( 3,7-alpine
) image - 98.6MB, while the basic distroless image used in the example is 50.9MB.
We can rightly point out that in the previous example we copy the whole directory /usr/local/lib/python3.7/site-packages
, which may contain unnecessary dependencies to us. Although it’s clear that the size difference of all existing Python base images varies.
At the time of writing these lines, Google distroless does not support many images: Java and Python are still at the experimental stage, and Python exists only for 2.7 and 3.5.
Let's get back to my obsession with creating small images.
Actually, I wanted to see how distroless images work. The distroless project uses the Google bazel build bazel
. However, to install Bazel and write your own images, I had to sweat (and to be completely honest, reinventing the wheel is fun and informative). I wanted to simplify the creation of reduced images: the act of creating an image should be extremely simple, banal . So that you don’t have any configuration files, only one line in the console: <>
.
So, if you want to create your own images, then know: there is such a unique docker image, scratch
. Scratch is an "empty" image, it has no files, although it weighs by default - wow! - 77 bytes.
FROM scratch
Scratch image
The idea of a scratch image is that you can copy any dependencies from the host machine into it and either use them inside the Dockerfile (this is how to copy them to apt
and install from scratch), or later, when the Docker image is materialized. This allows you to fully control the contents of the Docker container, and, thus, completely control the size of the image.
And now we need to somehow collect these dependencies. Existing tools like apt
allow you to download packages, but they are tied to the current machine and, in the end, do not support Windows or MacOS.
And so I undertook to assemble my own tool, which would automatically assemble the basic image of the smallest possible size and so that it would still launch any application. I used Ubuntu / Debian packages, made a selection (getting packages straight from the repositories) and recursively found their dependencies. The program should automatically download the latest stable version of the package, minimizing security risks.
I called the fetchy
, because it ... finds and brings ... what is needed [ from English. "fetch", "bring" - approx. per. ]. The tool works through the command line interface, but at the same time offers an API.
In order to build an image using fetchy
(let's take a Python image this time), you just need to use the CLI like this: fetchy dockerize python
. You may be asked for the target operating system and code name, since fetchy
only uses Debian and Ubuntu-based packages so far.
Now you can choose which dependencies are not needed at all (in our context) and exclude them. For example, Python depends on perl, although it works great without Perl installed.
The Python image created with the fetchy dockerize python3.5
weighs only 35MB (I'm more than sure that it can be made even easier in the future). It turns out that with a distroless image we managed to “shave off” another 15MB.
All currently collected images can be viewed here .
The project is here .
If you do not have enough functions, just create an application - I will be happy to help :) Even more, I am currently working on integration of other package managers into fetchy, so that the need for multi-stage builds is no longer needed.