From virtual assistants to self-driving cars, machines that operate intelligently and autonomously require massive amounts of logic and memory functions to turn voluminous amounts of data into real-time insights and actions. That’s where the chips come in. AI-related semiconductors could make up 20% of market demand by 2025, according to a report by McKinsey & Company. The report goes on to note that semiconductor companies could capture up to 50% of the total value from the technology stack thanks to AI. Given the opportunities, a number of startups as well as hyperscalers have stepped on to the chip development stage.
The history of AI dates back to the 1950s. The math developed back then still applies today, but the ability to create AI applications to everyday life simply wasn’t possible. By the 1980s, we started to see the emergence of expert systems that could perform tasks with some intelligence, such as symptom matching functions on healthcare websites. In 2016, deep learning made its bold entrance, changing the world through capabilities like image recognition and ushering in the growing criticality of hardware and compute performance. Today, AI goes beyond large systems like cars and scientific modeling systems. It’s shifting from the data center and the cloud to the edge, largely driven by inference, during which a trained deep neural network model infers things about new data based on what is has already learned.
Smartphones, augmented reality/virtual reality (AR/VR), robots, and smart speakers are among the growing number of applications featuring AI at the edge, where the AI processing happens locally. By 2025, 70% of the world’s AI software is expected to be run at the edge. With hundreds of millions of edge AI devices already out in the world, we’re seeing an explosion of real-time abundant-data computing that typically requires 20-30 models and mere microseconds latency to execute. In autonomous navigation for, say, a car or a drone, that latency requirement for a safety-critical system to respond is only 20 millionths of a second. And if we’re talking about cognitive voice and video assistants that must understand human speech and gestures, the latency threshold plummets even more to under 10µs for keyword recognition and under 1µs for hand gesture recognition.
Then there are commercial deep learning networks that require even greater compute power. For example, consider Google’s LSTM1 voice recognition model that uses natural language—it has 56 network layers and 34 million weights, and performs roughly 19 billion operations per guess. To be effective, the model needs to be able to understand the question posed and formulate a response in no longer than 7ms. To meet the latency requirement Google designed its own custom chip, the Tensor Processing Unit (TPU). The TPU family, now in its 3rd generation, provides an example of how the new software paradigm is driving new hardware architectures, and is used to accelerate neural network computations for a variety of Google services.