Bradley Geden, Sr. Product Marketing Manager
Bradley Geden, Sr. Product Marketing Manager
Artificial intelligence (AI) is poised to give semiconductors an enormous boost. That’s partly because more chips will be sold. But hardware will also get a bigger share of the dollars in the hardware/software stack, because, without the right hardware, software execution will be sluggish. McKinsey sees AI silicon sales quadrupling by 2025, landing somewhere in the range of $18-21 billion per year. A bigger share of that hardware will be in the cloud for training and cloud inference, but hardware for edge inference will see faster growth. So, the race is on for creating the most effective training and inference hardware engines.
But AI is posing huge challenges for silicon designers in the balancing of performance, power, and area (PPA). Aggressive optimizations implemented throughout the RTL-to-GDSII flow help to improve overall PPA results while allowing faster design closure. Those optimizations, however, are of little value if they cannot be verified through formal equivalence checking before being fully committed to ensuring that overall functionality is correct.
AI designs place extreme demands upon silicon and designers. The focus of functionality is on general matrix multiplication (GEMMs) and convolutions. These require intensive multiply-accumulate (MAC) operations, with lots of moving data that drives up power. A careful balance must be struck between computation and memory, with high memory capacity and bandwidth required to avoid limiting performance due to data movement bottlenecks.
A full AI solution involves a complete software stack atop the hardware, along with design tools and an AI Software Development Kit (SDK). But it’s the hardware that’s critical for scaling. Google claims that, without its Tensor Processing Unit (TPU), the growth in Google Assistant would have required that the data center size be doubled. According to McKinsey, the majority of AI hardware will be an SoC or, like the TPU, an ASIC.
PPA refers to the balance and tradeoffs between speed, power, and silicon area (which means cost). For AI, each one of these can be important:
Keeping the balance between these three attributes is a really tough job. It may mean trying multiple architectures, and, for each of those, it places a huge burden on the EDA tools.
Figure 1: Advanced optimizations in synthesis utilized in AI designs.
Advanced optimizations make extensive changes to the design structure. Ideally, all of the changes are made perfectly, and the original functionality remains. But the cost of a mask re-spin and the opportunity cost of being late to market mean that a designer will want to start running equivalency checking as early as possible in the design process through the synthesis iterations to ensure that as various optimizations are experimented with, that ultimately, they end up with a design that has the highest Quality of Results (QoR)/PPA and can be verified.
The only way to be confident – and to give all stakeholders the confidence – that overall functionality remains unperturbed through the design flow and optimization is through formal logic equivalence checking. But some of the more sophisticated optimizations may obscure internal points that must be proven equivalent. This can cause a proof that takes a very long time to complete – or might even not complete at all, requiring design engineers to switch off optimizations and sacrifice their PPA goals.
The speed with which the proof can be obtained is critical. Since multiple synthesis runs may be needed, the formal proof will run many times. It can’t become a significant rate-determining step, or else schedule pressures will tempt design teams to omit the proof, putting the design at risk. In order to make the proofs easier and faster, the equivalence-checking engine must understand the optimizations so that it can then prove functional equivalence. This “knowledge” gives the checker hints as to the changes that have happened; the duty then falls on the checker to prove that the changes made to the design at each step of the optimization process have not resulted in a change in functionality.
As tape-out approaches, last-minute engineering change orders (ECOs) are inevitable. Larger more complex ECOs which cannot be implemented manually do need to go through synthesis as a first step in the ECO process. There is no point in pushing the PPA limits when optimizations need to be switched off during ECO synthesis for the ECO tool to generate a patch. An ECO solution must maintain the optimizations made during original synthesis but avoid a complete run through the entire toolchain. It is important that the equivalence checker be able to guide the ECO process, ensuring that, even after localized fixes have been made, the optimizations and all other functionality, unrelated to the ECO, remain intact.
Figure 2: ECO flow needs to use the same advanced optimizations as original synthesis