Summer in Moscow this year turned out, frankly, not very. It began too early and quickly, not everyone managed to react to it, and it ended already at the end of June. Therefore, when Huawei suggested that I go to China, to the city of Chengdu, where their RnD center is located, looking at the weather forecast at +34 degrees in the shade, I immediately agreed. Still, my age is not the same and my bones need to be warmed up a little. But I want to note that it was possible to warm not only the bones, but also the insides, because the Sichuan province, in which, in fact, Chengdu is famous for its love of spicy food. But still, this blog is not about travel, so back to the main goal of our trip - the new line of storage systems - Huawei Dorado V6. This article will wave you a little from the past, as It was written before the official announcement, but published only after release. And so, today we look at everything interesting and tasty that Huawei has prepared for us.
The new line will have 5 models. All models except 3000V6 can be in two versions - SAS and NVMe. The disk interface that you can use in this system, the back-end ports and the number of disk drives that you can install in the system depend on the choice. NVMe uses Palm-sized SSDs, which are thinner than the classic 2.5 ”SAS SSDs and can accommodate up to 36 units. The new line is All Flash and there are no disk configurations.
Palm NVMe SSD
In my opinion, the Dorado 8000 and 18000 appear to be the most interesting models. Huawei is positioning them as a High-end system, and, thanks to Huawei's pricing policy, it contrasts these Mid-range models with a competitor segment. It is on these models that I will concentrate today in my review. Immediately, I note that due to its design features, the younger dual-controller systems have a slightly different architecture, different from the Dorado 8000 and 18000, so not everything that I will talk about today applies to younger models.
One of the main features of the new systems was the use of several chips of their own design, each of which allows to carry the logical load from the central processor of the controller and add functionality to different components.
The heart of the new systems is the Kunpeng 920 processors, developed on ARM technologies and manufactured independently by Huawei. Depending on the model, the number of cores varies, their frequency and the number of installed processors in each controller:
Huawei Dorado V6 8000 - 2CPU, 64 core
Huawei Dorado V6 18000 - 4CPU, 48 core
Huawei developed this processor on the ARM architecture, and as far as I know, it originally planned to put it only in the older Dorado 8000 and 18000 models, as it was with some V5 models, but the sanctions made adjustments to this idea. Of course, ARM also talked about the refusal of cooperation with Huawei during the imposition of sanctions, but here the situation is different than with Intel. Huawei is making these chips on its own, and no sanctions can stop this process. The severance of relations with ARM threatens only with a loss of access to new developments. As for performance - here it will be possible to judge only after conducting independent tests. Although I saw how 1M IOPS was removed from the Dorado 18000 system without any problems, until I do it myself with my own hands in the rack, I won’t believe it. But the capacities in the controllers there really are not enough. The older models are equipped with 4 controllers, each of which has 4 processors installed, which gives a total of 768 cores.
But I’ll tell you about the kernels even later, when we look at the architecture of new systems, but for now we’ll return to another chip installed in the system. The Ascend 310 chip looks like an extremely interesting solution (as I understand it, the Ascend 910 younger brother, which was recently introduced to the public). Its task is to analyze the data blocks arriving at the system to increase the Read hit ratio. It is still difficult to say how he will show himself in work, because Today it works only according to a given pattern and does not have the ability to learn in an intellectual mode. The appearance of an intelligent mode is promised in future firmware, most likely at the beginning of next year.
Let's move on to architecture. Huawei has continued to develop its own Smart Matrix technology, which implements a full mesh approach to connecting components. But if in V5 it was only for access from controllers to disks, now all controllers have access to all ports on both the Back-End and Front-End.
Thanks to the new microservice architecture, this also allows balancing the load between all controllers, even if there is only one lun. The OS for this line of arrays was developed from scratch, and not just optimized for the use of Flash-drives. Due to the fact that all the controllers have access to the same ports, in the event of a controller failure or reboot, the host does not lose a single path to the storage system, and path switching is performed at the data storage system level. At the same time, using UltraPath on the host is not a strict necessity. Another “savings” in the installation of the system is a smaller number of necessary links. And if with the “classical” approach for 4 controllers we need 8 links from 2 factories, then in the case of Huawei even 2 will be enough (I'm not talking now about the sufficiency of the bandwidth of one link).
As in the previous version, a global cache with mirroring is used. This allows you to lose up to two controllers simultaneously or three controllers in series without affecting availability. But it is worth noting that we did not see the full load balancing between the remaining 3 controllers in case of failure of one, at the demo stand. The load of the failed controller was completely taken over by one of the remaining ones. It is possible that for this it is necessary to let the system work for longer in this configuration. In any case, on my own tests I will check this in more detail.
Huawei is positioning new systems as an End-to-End NVMe system, but to date, the NVMeOF frontend is not yet supported, only FC, iSCSI or NFS. At the end of this or the beginning of the next, like other chips, we are promised RoCE support.
Shelves are connected to the controllers in the same way by RoCE and there is one drawback connected with this - the absence of a “loop” connection of shelves, as it was with SAS. In my opinion, while this is a pretty big drawback, if you have planned a fairly large system. The fact is that all shelves are connected in series, and the failure of one of the shelves entails the complete inaccessibility of all the others following it. In this case, to ensure fault tolerance, we have to connect all the shelves to the controllers, which entails an increase in the required number of backend ports in the system.
And one more thing worth mentioning is non-disruptive update (NDU). As I said above, Huawei has implemented a container approach to operating the OS for the new Dorado line, this allows you to update and restart services, without the need for a full reboot of the controller. It is worth mentioning right away that some updates will contain kernel updates, and in this case the classic reboot of the controllers will sometimes still be required when updating, but not always. This will reduce the level of influence of this operation on the productive system.
In our arsenal, the vast majority of arrays from the company NetApp. Therefore, I think it will be quite logical if I make a small comparison with the systems with which I have to work quite a lot. This is not an attempt to determine who is better and who is worse or whose architecture is more advantageous. I’ll try soberly and without fanaticism to compare two different approaches to solving the same problem from different vendors. Yes, of course, in this case, we will consider Huawei systems in “theory” and I will also separately note those moments that are only planned to be implemented in future firmware versions. What are the pluses that I see at the moment:
Cons and fears in general:
Most fears and fears will be able to dispel the own testing of the new line. I hope that soon after the release they will already appear in Moscow and there will be enough of them to quickly get one for their own tests. So far, it can be said that, on the whole, the company's approach looks interesting, and the new line looks very good against competitors. the final implementation raises a lot of questions, because we will see many things only at the end of the year, and maybe only in 2020.