AI accelerators operate in two key realms: data centers and the edge. Today’s data centers—particularly hyperscale data centers that may support as many as thousands of physical servers and millions of virtual machines—demand massively scalable compute architectures. This has prompted some in the chip industry to go big in the name of accelerating AI workloads. For example, Cerebras has created the Wafer-Scale Engine (WSE) for its Cerebras CS-1 deep-learning system. At 46,225mm2 with 1.2 trillion transistors and 400,000 AI-optimized cores, the WSE is the biggest chip built so far. By providing more compute, memory, and communication bandwidth, the WSE can support AI research at speeds and scale that were previously impossible. At the other end of the spectrum is the edge, where real estate for hardware is limited and energy efficiency is essential. Here, edge SoCs with AI accelerator IP integrated inside can quickly deliver the intelligence needed to support applications such as interactive programs that run on smartphones or robotics in automated factories. Given the variety of applications where intelligence is at the edge, AI accelerators that support them must be optimized for characteristics such as real-time computational latency, ultra-high energy efficiency, fail-safe operation, and high reliability.
Not every AI application needs a chip as large as the WSE. Other types of hardware AI accelerators include:
- Graphics processing units (GPUs) with temporal neural network processing
- Spatial accelerators like Google’s Tensor Processing Unit (TPU)
- Coarse-grain reconfigurable architecture (CGRA) systems like Sambanova’s DataScale
- Massively multicore scalar processors with vector processing extensions
Each of these types of chips can be combined by the tens or the hundreds to form larger systems that can process large neural networks. For example, Google’s TPU can be merged in pod configurations that bring more than 100 petaFLOPS of processing power for training neural network models. Megatron, from the Applied Deep Learning Research team at NVIDIA, delivers an 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism for natural language processing. Executing this model required development of the NVIDIA A100 GPU, which delivers 312 teraFLOPS of FP16 compute power. Another emerging hardware type is the CGRA, which provides nice tradeoffs between performance/energy efficiency and flexibility for programming different networks.
In this discussion of AI hardware, one cannot neglect the software stack that enables system-level performance and ensures that the AI hardware is fully utilized. Open-source software platforms like TensorFlow provide tools, libraries, and other resources for developers to easily build and deploy machine learning applications. Machine learning compilers, such as Facebook Glow, are emerging to help facilitate connectivity between the high-level software frameworks and different AI accelerators.