Update! . In the comments, one of the readers suggested trying Linstor (maybe he works on it himself), so I added a section about this solution. I also wrote a post on how to install it , because the process is very different from the rest.
To be honest, I gave up and abandoned Kubernetes (at least for now). I will use Heroku . Why? Because of storage! Who would have thought that I would be more concerned with storage than with Kubernetes itself. I use Hetzner Cloud because it is inexpensive and performance is good, and from the very beginning I deployed clusters using Rancher . I have not tried Kubernetes managed services from Google / Amazon / Microsoft / DigitalOcean, etc., etc., because I wanted to learn everything myself. And I'm economical.
So - yes, I spent a lot of time trying to decide which storage to choose when I was considering a possible stack on Kubernetes. I prefer open source solutions, and not only because of the price, but I studied a couple of paid options out of curiosity, because they have free versions with restrictions. I wrote down a few numbers from the latest tests when I was comparing different options, and they might be of interest to those who study storage in Kubernetes. Although I personally have said goodbye to Kubernetes so far. I also want to mention the CSI driver , in which you can directly prepare Hetzner Cloud volumes, but I have not tried it yet. I studied cloud-based software-defined storage because I needed replication and the ability to quickly mount persistent volumes on any node, especially in the event of a node failure and other similar situations. Some solutions offer snapshots at a point in time and off site backups, which is convenient.
I tested 6-7 storage solutions:
As I said in a previous post , after testing most of the options on the list, I initially settled on OpenEBS. OpenEBS is very easy to install and use, but to be honest, after testing with real data under load, its performance disappointed me. This is an open source, and the developers on their Slack channel always helped a lot when I needed help. Unfortunately, it has very low performance compared to other options, so I had to run the tests again. OpenEBS now has 3 storage engines, but I am posting the benchmark results for cStor. So far I have no numbers for Jiva and LocalPV.
In a nutshell, Jiva is a little faster, and LocalPV is generally faster, no worse than the benchmark of the drive directly. The problem with LocalPV is that access to the volume can only be obtained on the node where it was prepared, and there is no replication at all. I had some problems recovering backup via Velero on a new cluster, because the node names were different. If we talk about backups, cStor has a plugin for Velero , with which you can do off site backups of snapshots at a time, which is more convenient than backups at the file level with Velero-Restic. I wrote several scripts to make it easier to manage backups and restorations with this plugin. Overall, I really like OpenEBS, but its performance ...
Rook also has open source code, and it differs from the other options in the list in that it is a storage orchestrator that performs complex tasks of managing storage with different backends, such as Ceph , EdgeFS and others, which greatly simplifies the work. I had problems with EfgeFS when I tried it a few months ago, so I tested mainly with Ceph. Ceph offers not only block storage, but also object storage compatible with S3 / Swift and the distributed file system. What I like about Ceph is the ability to distribute volume data across multiple disks so that the volume can use more disk space than can fit on a single disk. It's comfortable. Another cool feature is when you add disks to a cluster, it automatically redistributes data across all disks.
There are snapshots in Ceph, but as far as I know, they cannot be used directly in Rook / Kubernetes. True, I did not go into this. But off site there are no backups, so you have to use something with Velero / Restic, but there are only backups at the file level, and not snapshots at the time. But in Rook, I really liked the simple work with Ceph - it hides almost all complex things and offers tools to communicate with Ceph directly for troubleshooting. Unfortunately, in the stress test of Ceph volumes, I had this problem all the time, due to which Ceph became unstable. Itās not yet clear whether this is a bug in Ceph itself or a problem in how Rook controls Ceph. I conjured with the memory settings, and it became better, but the problem was not solved until the end. Ceph has good performance, as seen in the benchmarks below. He also has a good dashboard.
I really like Longhorn. In my opinion, this is a promising solution. True, the developers themselves (Rancher Labs) acknowledge that it is not suitable for the working environment, and this can be seen. It has open source code and good performance (although they have not yet optimized it), but volumes take a very long time to connect to the hearth, and in worst cases it takes 15-16 minutes, especially after restoring a large backup or upgrading the workload. He has snapshots and off site backups of these snapshots, but they apply only to volumes, so you still need something like Velero to backup other resources. Backups and recovery are very reliable, but indecently slow. Seriously, just prohibitively slow. CPU utilization and system load often jump when working with average data in Longhorn. There is a convenient dashboard to control the Longhorn. I already said that I like Longhorn, but I need to work on it properly.
StorageOS is the first paid product on the list. It has a version for developers with a limited managed storage of 500 GB, but the number of nodes, in my opinion, is not limited. The sales department told me that the cost starts at $ 125 per month for 1 TB, if I remember correctly. There is a basic dashboard and a convenient CLI, but something strange is happening with the performance: in some benchmarks it is quite decent, but I did not like the speed at all in the stress test of the volumes. In general, I do not know what to say. Therefore, I did not particularly understand. There are no off site backups and you will also have to use Velero with Restic to backup volumes. It's strange, because the product is paid. And the developers were not eager to communicate in Slack.
I learned about Robin on Reddit from their technical director. I had never heard of him before. Maybe because I was looking for free solutions, and Robin paid. They have a pretty generous free version with 10 TB of storage and three nodes. In general, the product is quite decent and with nice features. There is an excellent CLI there, but the cool thing is that you can snapshot and backup the entire application (in the resource selector this is called Helm releases or āflex appsā), including volumes and other resources, so you can do without Velero. And everything would be wonderful if it werenāt for one small detail: if you restore (or āimportā, as Robin calls it) an application on a new cluster - for example, in case of recovery after an accident - recovery, of course, works, but continue to backup the application not allowed. In this release, this is simply not possible, and the developers have confirmed. This, to put it mildly, is strange, especially when you consider the other advantages (for example, incredibly fast backups and recovery). The developers promise to fix everything for the next release. Performance, in general, is good, but I noticed a strange thing: if you run the benchmark directly on the volume connected to the host, the read speed is much higher than in the same volume, but from the inside. All other results are identical, but in theory there should be no difference. Although they are working on it, I was upset because of the problem with recovery and backup - it seemed to me that I finally found a suitable solution, and I was even ready to pay for it when I needed more space or more servers.
I really have nothing to say here. This is a paid product, equally cool and expensive. Performance is a miracle. So far this is the best indicator. Slack told me that the price starts at $ 205 per month per node, as indicated in the Google GKE Marketplace. I do not know if it will be cheaper if you buy directly. In any case, I cannot afford it, so I was very, very disappointed that the developer license (up to 1 TB and 3 nodes) is practically useless with Kubernetes, unless you are content with static preparation. I was hoping the volume license would automatically drop to the developer's level at the end of the trial period, but it didnāt. The developer license can only be used directly with Docker, and the configuration in Kubernetes is very cumbersome and limited. Of course, I prefer open source, but if I had money, I would definitely choose Portworx. So far, its performance just does not compare with other options.
I added this section after the post was published, when one reader suggested trying Linstor. I tried it and I liked it! But you still have to dig. Now I can say that the performance is not bad (I added the benchmark results below). In fact, I got the same performance as for the drive directly, with absolutely no overhead. (Donāt ask why Portworx has better numbers than the benchmark of the disk directly. I have no idea. Magic, probably.) So Linstor still seems very effective. Installing it is not that difficult, but not as easy as the other options. First I had to install Linstor (the kernel module and tools / services) and configure LVM for thin provisioning and support snapshots outside Kubernetes, directly on the host, and then create the resources needed to use the storage from Kubernetes. I did not like that it did not work on CentOS and had to use Ubuntu. Not scary, of course, but a little annoying, because the documentation (which, incidentally, is excellent) mentions several packages that cannot be found in the specified Epel repositories. Linstor has snapshots, but not off site backups, so here again I had to use Velero with Restic to backup volumes. I would prefer snapshots instead of file-level backups, but this can be tolerated if the solution is productive and reliable. Linstor has open source, but has paid support. If I understand correctly, it can be used without restrictions, even if you do not have a support agreement, but this needs to be clarified. I donāt know how Linstor is tested for Kubernetes, but the storage level itself is outside of Kubernetes and, apparently, the solution did not appear yesterday, so itās probably already been tested in real conditions. Is there a solution here that will make me change my mind and get back to Kubernetes? I do not know. It is still necessary to dig deeper, to study replication. We'll see. But the first impression is good. I would definitely prefer to use my own Kubernetes clusters instead of Heroku to gain more freedom and learn new things. Since Linstor is not installed as easily as others, I will soon write a post about it.
Unfortunately, I have kept few records of the comparison, because I did not think that I would write about it. I have only the results of the fio base benchmarks and only for single-node clusters, so for replicated configurations I don't have numbers yet. But from these results, you can get an approximate idea of āāwhat to expect from each option, because I compared them on the same cloud servers, 4 cores, 16 GB of RAM, with an additional 100 GB disk for the volumes under test. I ran the benchmarks three times for each solution and calculated the average result, plus reset the server settings for each product. All this is completely unscientific, just to make you understand in general terms. In other tests, I copied 38 GB of photos and videos from the volume and to test reading and writing, but, alas, I did not save the numbers. In short: Portworkx was much faster.
For the benchmark of volumes, I used this manifest:
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: dbench spec: storageClassName: ... accessModes: - ReadWriteOnce resources: requests: storage: 5Gi --- apiVersion: batch/v1 kind: Job metadata: name: dbench spec: template: spec: containers: - name: dbench image: sotoaster/dbench:latest imagePullPolicy: IfNotPresent env: - name: DBENCH_MOUNTPOINT value: /data - name: FIO_SIZE value: 1G volumeMounts: - name: dbench-pv mountPath: /data restartPolicy: Never volumes: - name: dbench-pv persistentVolumeClaim: claimName: dbench backoffLimit: 4
First I created a volume with the corresponding storage class, and then I started the task with fio behind the scenes. I took 1 GB to estimate performance and not to wait too long. Here are the results:
I highlighted the best value for each indicator in green and the worst in red.
As you can see, in most cases Portworx performed better than others. But to me, he is dear. I donāt know how much Robin costs, but thereās a great free version, so if you need a paid product, you can try it (I hope they will fix the problem with recovery and backups soon). Of the three free ones, I had least of all problems with OpenEBS, but its performance is not to hell. Itās a pity, I didnāt save more results, but I hope the figures given and my comments help you.