Different AI accelerator architectures may offer different performance tradeoffs, but they all require an associated software stack to enable system-level performance; otherwise, the hardware could be underutilized. To facilitate connectivity between high-level software frameworks, such as TensorFlow™ or PyTorch™, and different AI accelerators, machine learning compilers are emerging to enable interoperability. A representative example is the Facebook Glow compiler.
Measuring performance of AI accelerators has been a contentious topic. For an independent assessment of training and inference performance of machine learning hardware, software, and services, teams can consult MLPerf, an independent organization formed by a group of engineers and researchers from industry and academia.
As intelligence moves to the edge in many applications, this is creating greater differentiation in AI accelerators. The edge offers a tremendous variety of applications that requires AI accelerators to be specifically optimized for different characteristics like latency, energy efficiency, and memory based on the needs of the end application. For example, while autonomous navigation demands a computational response latency limit of 20μs, voice and video assistants must understand spoken keywords in less than 10μs and hand gestures in a few hundred milliseconds.
In the future, cognitive systems, which aim to simulate human thought processes, will emerge with greater prominence. Compared to today’s neural networks, cognitive systems have a deeper understanding of how to interpret data at a different level of abstraction.