Moore's law has reached the limit

Note. Article publication date: 12/26/2015. Since that time, some theses of the author have been confirmed by real facts, and some have proved to be erroneous - approx. per.

In the past 40 years, we have seen how the speed of computers has grown exponentially. In today's CPUs, the clock speed is a thousand times higher than that of the first personal computers in the early 1980s. The amount of RAM on the computer has grown ten thousand times, and the capacity of the hard disk has increased more than a hundred thousand times. We are so accustomed to this continuous growth, that we almost consider it a law of nature and call Moore's law. But there are limits to this growth, which Gordon Moore himself pointed out . We are now approaching the physical limit, where the speed of the calculations is limited by the size of the atom and the speed of light.

Intel's Tic-Tac toon clock started to skip bars here and there. Each "tick" corresponds to a reduction in the size of transistors, and each "so" is an improvement in the microarchitecture. The current generation of processors called the Skylake is “like that” with a 14-nanometer process. Logically, the next should be a "tick" with a 10-nanometer process technology, but Intel now gives "update cycles" after each "so." The next processor, announced for 2016, will be an update to Skylake, still on the 14-nanometer process technology . Tick-to-clock slowdown is a physical necessity, because we are approaching the limit, where the size of the transistor is only a few atoms (the size of the silicon atom is 0.2 nanometers).

Another physical limitation is the data transfer rate, which cannot exceed the speed of light. It takes several clock cycles to get data from one end of the CPU to the other end. As the chips become larger with a larger and larger number of transistors, the speed begins to be limited by the actual data transfer on the chip.

Technological constraints are not the only thing that slows down the evolution of processors. Another factor is the weakening of market competition. Intel's largest competitor, AMD, is now paying more attention to what it calls APU (Accelerated Processing Units), that is, smaller processors with integrated graphics for mini-PCs, tablets and other ultra-mobile devices. Intel has now taken over the overwhelming market share of processors for high-end PCs and servers. The fierce competition between Intel and AMD, which has pushed the development of x86 processors for several decades, has almost disappeared.

The growth of computer power in recent years comes not so much from an increase in the speed of calculations, but rather from an increase in parallelism. Modern microprocessors use three types of parallelism:

Simultaneous execution of several commands with a change in their order.
Single-Operation-Multiple-Data (SIMD) operations in vector registers.
Multiple CPU cores on a single chip.

These types of parallelism have no theoretical limits, but there are real practical ones. The execution of commands with a change in their sequence is limited by the number of independent commands in the program code. You cannot simultaneously execute two commands if the second command is waiting for the result of the first one. Current CPUs can usually execute four commands simultaneously. Increasing this number will not bring much benefit, because it will be difficult or impossible for the processor to find more independent commands in the code that can be executed simultaneously.

In current processors with the AVX2 instruction set, there are 16 vector registers of 256 bits each. The upcoming AVX-512 instruction set will give us 32 registers of 512 bits, and we can well expect in the future extensions to 1024 or 2048-bit vectors. But these increases in vector registers will have a lesser effect. Few computational tasks have sufficient built-in concurrency to benefit from these larger vectors. 512-bit vector registers are connected by a set of mask registers whose size limit is 64 bits. 2048-bit vector registers will be able to store 64 single-precision numbers of 32 bits each. We can assume that Intel does not plan to make vector registers more than 2048 bits, because they will exceed the limitations of 64-bit mask registers.

Numerous CPU cores provide an advantage only if there are many speed-critical concurrent programs or if the task is divided into multiple independent threads. The number of threads into which a task can be profitably divided is always limited.

Manufacturers will no doubt try to make more and more powerful computers, but what is the likelihood that this computer power can be used in practice?

There is a fourth possibility of parallelism, which is not yet used. In programs, there are usually lots of if-else branches, so if the CPU learns to predict which of the branches will work, then it would be possible to put it into execution. You can execute several code branches at once to avoid losing time if the prediction turns out to be incorrect. Of course, you have to pay for this with higher power consumption.

Another possible improvement is to place the programmable logic device on the processor chip. Such a combination is now commonplace for so-called FPGAs, which are used in advanced hardware. Such programmable logic devices in personal computers can be used to implement application-specific functions for tasks like image processing, encryption, data compression, and neural networks.

The semiconductor industry is experimenting with materials that can be used instead of silicon. Some III-V semiconductor materials can operate at lower voltages and at higher frequencies than silicon , but they do not make the atoms smaller or the light slower. Physical limitations are still in force.

Someday we may see three-dimensional multilayer chips. This will allow to condense the circuit, reduce the distance, and hence the delay. But how to effectively cool such a chip when energy is distributed everywhere inside it? Will require new cooling technology. The microcircuit will not be able to transmit power to all circuits at the same time without overheating. She will have to keep most of her parts unplugged for most of the time and supply power to each part only during its use.

In recent years, CPU speed has increased faster than RAM speed, which often becomes a serious bottleneck. Without a doubt, in the future we will see many attempts to increase the speed of RAM. The likely development will be to place the RAM on one chip with the CPU (or at least in one case) to reduce the distance for data transfer. It will be a useful use of three-dimensional chips. Probably, RAM will be of a static type, that is, each memory cell will be powered only when it is accessed.

Intel also provides a market for supercomputers for scientific use. The processor Knight's Corner - up to 61 cores on a single chip. It has a weak performance / price ratio, but its expected heir to Knight's Landing should be better in this respect. It will hold up to 72 cores on a chip and will be able to execute commands with a change in their order. This is a small niche market, but Intel can increase its credibility.

Now the best opportunities for improving performance, as I think, from the program side. Software developers have quickly found application to the exponential increase in the performance of modern computers, which was due to Moore's law. The software industry began to use it, and also began to use more and more advanced development tools and software frameworks. These high-level development tools and frameworks made it possible to accelerate software development, but at the expense of consuming more computing resources in the final product. Many of today's programs are quite wasteful in their excessive consumption of hardware computing power.

Over the years, we have observed a symbiosis between the hardware and software industries, where the latter produced more advanced and resource-intensive products that pushed users to buy more and more powerful equipment. Since the growth rate of hardware technology has slowed down, and users have switched to small portable devices, where the battery capacity is more important than performance, the software industry will now have to change course. She will have to cut down on resource-intensive development tools and multi-level software and develop programs that are not so full of functions. The development time will increase, but programs will consume less hardware resources and run faster on small portable devices with limited battery life. If the industry of commercial software does not change the course now, it can give up market share to more ascetic open source products.

All Articles

Moore's law has reached the limit

More articles: