Back in 2012, a high-end, off-the-shelf desktop graphics card might boast 1.6 TeraFLOPS of computing power to accelerate the convolutional neural networks (CNNs) that were making their way into the industry’s consciousness. Now, we’re heading into ExaFLOPS territory with ML accelerators and super-powerful AI processors with hundreds of thousands of AI-optimized cores tackling large language models (LLMs). These very large transformer neural networks, covering hundreds of billions of parameters, can be trained to write copy, answer questions, translate languages, and more. They’re also sparking the demand for more domain-specific architectures and highlighting how the co-optimization of software and hardware is critical for delivering the future of scalable AI systems.
Indeed, given the rapid progress of ML models, nothing short of a dramatic improvement in the underlying hardware is needed. From generation to generation, Moore’s law has reliably contributed to substantial performance gains and power reductions. But in the AI generation, where performance must double every six months to keep pace, Moore’s law has fallen behind—especially so when it comes to handling LLMs.
As engineers strive to extract more benefits from Moore’s law, the chip design industry is hitting multiple walls:
- The processing wall, which hampers the scaling of training FLOPS
- The memory wall, as parameter count far outpaces local memory scaling
- The bandwidth wall, as hardware far outpaces memory and interconnect bandwidth
Today’s trends are leading us to tomorrow’s challenges—and opportunities. As we approach the reticle limits of manufacturing, density scaling is projected to slow down as costs increase. Moving to larger die sizes isn’t the answer from a cost per yield standpoint. I/O limitations are creating another stumbling block, with only modest improvements in the die-to-die interconnect pitch over recent years. However, high-density integration and packaging advances, including 3D-stacking technologies, are helping to transcend the technical barriers and paving the way for new silicon-to-system design architectures to take the electronics industry through the next decade of innovation.