Chips for ML - talk about new products

We are talking about new architectures of both major global manufacturers and startups - waferscale-chips, tensor processors and graph-based devices.



Topic selection:







Photos - Jason Leung - Unsplash



Waferscale for deep learning



In the production of classical processors, a silicon substrate is divided into individual crystals. But in the case of waferscale processors, the semiconductor wafer is not divided - it becomes a large chip. As a result, the components are closer to each other, and system performance increases.



This approach was taken by engineers from Cerebras Systems and TSMC, developing a chip for deep learning - Cerebras WSE . He was shown at the Hot Chips conference in late summer. The device is a square crystal with sides of 21.5 cm. It consists of 1.2 trillion transistors, combined in 400 thousand cores. These cores "communicate" with each other using the proprietary Swarm system with a bandwidth of 100 Pbit / s.



The developers say that the chip pre- optimizes the calculations by filtering out zero data in matrix operations - they make up from 50 to 98% of all values. As a result, learning a model on Cerebras is a hundred times faster than on classic GPUs. However, NYTimes reacted to such statements with a healthy share of skepticism - independent experts have not tested the hardware yet.



Cerebras computational cores are programmable. They can be optimized to work with any neural networks. It is expected that the new chip will find application in cloud systems and machine learning applications: from drones to voice assistants. It is not yet known when the chip will go on sale, but a number of companies are already testing it on workloads.



Silicon Interconnect Fabric (Si-IF) is another waferscale device project for MOs. It is being developed in the laboratory of the University of California. Si-IF is a device that combines dozens of GPUs on a single silicon wafer. The developers have already introduced two prototypes for 24 and 40 GPUs. Their performance is 2.5 times higher than the capabilities of classic devices. They plan to use the system in the data center.



Tensor processors



In May 2018, Google announced TPU v3 , the third generation of its tensor processors for working with the TensorFlow machine learning library . Little is known about the technical characteristics of the new device. The production version will be manufactured using 12- or 16-nm process technology. Thermal design power - 200 watts, performance - 105 TFLOPS when working with bfloat 16. This is a 16-bit floating point representation system used in deep learning.



On a number of tasks, the performance of the second generation Google TPU exceeded the capabilities of the NVIDIA Tesla V100 fivefold. Engineers say the third generation is eight times more powerful than its predecessor. We even had to install liquid cooling on the chips.





Photo - Cineca - CC BY



The corporation plans to transfer a number of its systems to the new tensor processors: voice assistant, photo processing service and RankBrain search query ranking algorithm. The company also wants to build cloud-based scalable supercomputers on the basis of TPU and open access to them for scientists involved in the study of AI systems. In late spring, the service was launched in beta mode.



Chips working with complex graphs



British startup Graphcore has developed a chip for deep learning tasks - the Colossus IPU (Intelligence Processing Unit). It contains 1200 cores and a set of specialized transcendental functions . Each core processes six threads. Iron is paired with Poplar software. It compiles models and builds on their basis complex multi-stage algorithmic graphs that run on IPU processors. Tests of the first Graphcore samples showed that they have a hundred times more performance than traditional GPUs.



Startup already ships a full-sized PCI-E card for servers. It has in its composition two IPU chips, made according to the 16-nm process technology and consisting of 24 billion transistors. The computing power of such a device is 125 TFLOPS. Cards are designed to work in data centers of IaaS providers and cars with autopilot. The founders of the startup say that more than a hundred customers work with their devices, but they do not name specific companies.



Competition in the field of hardware devices for machine learning is becoming increasingly serious. New players enter the market, offering innovative architectures, and eminent companies continue to increase the capacity of existing solutions. In any case, it plays into the hands of data center owners, data science engineers, and other specialists developing artificial intelligence systems.




Affiliate program 1cloud.ru . Users of our cloud can earn income and reduce the cost of renting virtual infrastructure.


For example, we offer the Private Cloud service. With its help, you can deploy IT infrastructure for projects of any complexity.





All Articles