From AI startups to the world’s largest cloud providers, some of the industry’s coolest AI chips this year — Ambarella CV52S, Atlazo AZ-N1, AWS Trainium, and Google TPU v4, to name a few—are already making waves in accelerating the industry’s race to faster and more efficient AI chips.
One of the key characteristics driving new AI system-on-chip (SoC) investments is the capacity to perform several calculations as a distributed operation, instead of the limited parallelism offered by traditional CPUs. For AI/ML-based hardware, the design entails data-heavy blocks consisting of a control path where the state machine processes outputs based on specific inputs; and a compute block comprised of arithmetic logic to crunch the data (think adders, subtracters, multipliers, and dividers). These features dramatically accelerate the identical, predictable, and independent calculations required by AI algorithms.
While the arithmetic compute block may not be extremely challenging for most design teams, the level of implementation complexity increases significantly as the number of arithmetic blocks and bits increase, thus adding additional strain to verification teams.
Over the past few years, data-centric computing has shifted beyond the confines of PCs and servers. Companies like NVIDIA and Intel are creating a new category of smart network interfaces. While NVIDIA launched its first Data Processing Unit (DPU), the BlueField-2X, foreseeing the emerging need for considerable pre-and post-data processing on the networking side of an accelerated server, Intel announced a different strategy with its Infrastructure Processing Uni (IPU) based on a Xeon CPU and an FPGA, opening new possibilities for accelerated performance than what today’s fastest servers deliver.
Consider the case of a simple 4-bit multiplier. To verify its complete functionality, test vectors need to be written for all possible input combinations, i.e., 24 = 16. The challenge? When it comes to verifying realistic scenarios of today’s AI chips, teams need to verify adders that have 64-bit inputs, owing to the sheer amount of data processing. This means that 264 states need to be verified – a feat that would take years using classical approaches.
This is just the case for one multiplier or divider in a design. Compounding these concerns, as the adoption of AI chips quickly expands and the amount of data generated continues to explode, time-consuming challenges associated with hardware verification make the need for modern, secure, and flexible verification solutions critical.