SynQuacer E-Series motherboard for a 24-core ARM server on an ARM Cortex A53 processor with 32 GB of RAM, December 2018
For many years, ARM processors with a reduced instruction set (RISC) have dominated the mobile device market. But they never managed to break into the data centers, where Intel and AMD with the x86 instruction set still dominate. Periodically, certain exotic solutions appear, such as a 24-core ARM server based on the Banana Pi platform , but there are no serious offers yet. More precisely, it was not until this week.
This week, AWS launched its own 64-core Graviton2 ARM processor in the cloud, an on-chip system with the ARM Neoverse N1 core. The company claims that Graviton2 is much faster than the previous generation ARM processors in EC2 A1 instances, and here are the first independent tests .
Infrastructure business is a comparison of numbers. In fact, customers of a data center or cloud service do not care what architecture the processors have. They care about the price-performance ratio. If working on ARM is cheaper than on x86, then they will be chosen.
Until recently, it was impossible to say unequivocally that calculations on ARM would be more profitable than on x86. For example, the server-side 24-core ARM Cortex A53 is a SocioNext SC2A11 model costing about $ 1000, which could raise a web server on Ubuntu, but was much inferior in performance to the x86 processor.
However, the amazing energy efficiency of ARM processors makes you look at them again and again. For example, SocioNext SC2A11 consumes only 5 watts. But electricity accounts for almost 20% of the cost of a data center. If these chips show decent performance, then x86 will have no chance.
ARM's First Coming: EC2 A1 Instances
At the end of 2018, AWS introduced EC2 A1 instances on its own ARM processors. Definitely, this was a signal to the industry about potential changes in the market, but the benchmark results were disappointing.
The table below shows the results of stress testing of EC2 A1 (ARM) and EC2 M5d.metal (x86) instances. For testing, the
stress-ng
utility was used:
stress-ng --metrics-brief --cache 16 --icache 16 --matrix 16 --cpu 16 --memcpy 16 --qsort 16 --dentry 16 --timer 16 -t 1m
As you can see, A1 performed worse in all tests except the cache. For most other indicators, ARM yielded very much. This performance difference is greater than the 46% price difference between A1 and M5. In other words, instances on x86 processors still remained more cost-effective:
Test | EC2 A1 | EC2 M5d.metal | Difference |
cache | 1280 | 311 | 311.58% |
icache | 18209 | 34368 | -47.02% |
matrix | 77932 | 252190 | -69.10% |
cpu | 9336 | 24077 | -61.22% |
memcpy | 21085 | 111877 | -81.15% |
qsort | 522 | 728 | -28.30% |
dentry | 1389634 | 2770985 | -49.85% |
timer | 4970125 | 15367075 | -67.66% |
Of course, microbenchmarks do not always show an objective picture. The difference in real application performance is important. But here the picture was no better. Scylla colleagues compared a1.metal and m5.4xlarge instances with the same number of processors. In the standard test for reading a NoSQL database in a single-node configuration, the first showed 102,000 reads per second, and the second 610,000. In both cases, all available processors are 100% used. This corresponds to a decrease in productivity of about six times, which is not offset by a lower price.
In addition, A1 instances only run on EBS without support for fast NVMe devices, as in other instances.
In general, A1 was a step in a new direction, but did not live up to ARM's expectations.
ARM's Second Coming: EC2 M6 Instances
Everything changed this week when AWS introduced a new class of ARM servers, as well as a number of instances on the new Graviton2 processors, including M6g and M6gd .
A comparison of these instances shows a completely different picture. In some tests, ARM performs better, and sometimes much better than x86.
Here are the results of the same stress testing team:
Test | EC2 M6g | EC2 M5d.metal | Difference |
cache | 218 | 311 | -29.90% |
icache | 45887 | 34368 | 33.52% |
matrix | 453982 | 252190 | 80.02% |
cpu | 14694 | 24077 | -38.97% |
memcpy | 134711 | 111877 | 20.53% |
qsort | 943 | 728 | 29.53% |
dentry | 3088242 | 2770985 | 11.45% |
timer | 55515663 | 15367075 | 261.26% |
This is a completely different matter: M6g is five times faster than A1 when reading from the Scylla NoSQL database, and newer M6gd instances run fast NVMe drives.
ARM offensive on all fronts
The AWS Graviton2 processor is just one example of using ARM in data centers. But the signals come from different directions. For example, on November 15, 2019, U.S. startup Nuvia raised $ 53 million in venture capital funding .
The startup was founded by three leading engineers who were involved in the creation of processors at Apple and Google. They promise to develop processors for data centers that will compete with Intel and AMD.
According to available information , Nuvia designed from scratch a processor core that can be built "on top" of the ARM architecture, but without obtaining an ARM license.
All this indicates that ARM processors are ready to conquer the server market. After all, we live in the post-PC era. X86 annual shipments have fallen by almost 10% since peak 2011, while RISC chips have skyrocketed to 20 billion. Today, 99% of 32- and 64-bit processors in the world are RISC.
Turing Prize winners John Hennessey and David Patterson published an article entitled “The New Golden Age for Computer Architecture” in February 2019. Here is what they write:
The market settled the dispute between RISC and CISC. Although CISC won the later stages of the PC era, RISC wins now that the post-PC era has arrived. There are no new ISAs at CISC for decades. To our surprise, the general consensus on the best ISA principles for general-purpose processors today is still leaning toward RISC, 35 years after its invention ... In open-source ecosystems, artificially designed chips will convincingly demonstrate achievements and thus accelerate commercial implementation. The general-purpose processor philosophy in these chips is likely to be RISC, which has stood the test of time. Expect the same fast-paced innovation as you did during the last golden age, but this time in terms of cost, energy and safety, not just performance.
“In the next decade, a Cambrian explosion of new computer architectures will take place, meaning exciting times for computer architects in academia and industry,” they conclude at the end of the article.