An end-to-end approach to energy efficiency for AI accelerators must start at the architectural and micro-architectural levels during the earliest stages of the design flow and conclude at signoff. That’s why AI chip designers rely on architectural exploration platforms to map and evaluate power, performance, and area (PPA) tradeoffs for specific training or inference applications while proactively identifying critical vectors for downstream analysis.
As AI hardware typically consists of large arrays with thousands of tiles (processing elements), billion-plus-gate designs require multi-domain hardware and software power verification to minimize energy consumption and leakage. However, analyzing crucial power blocks and time windows requires advanced emulation systems to run billions of cycles and rapidly deliver multiple—and accurate—iterations. Only after completing this step can register transfer level (RTL) power analysis and physical implementation tools effectively optimize dynamic (gate switching) and static (leakage) power dissipation.
To consistently deliver accurate results, RTL power analysis tools for AI chip design should include the following capabilities:
- Timing-driven fast synthesis: Internal power calculation errors are often caused by fanout-based fast synthesis tools that fail to properly size cells following timing constraints. Like their downstream place-and-route counterparts, fast synthesis embedded in RTL power analysis tools must be timing driven.
- Physically aware fast synthesis: RTL power analysis tools should be “physical aware” and capable of obtaining precise net capacitance values by executing first-pass placement of the cells in the design, as well as global routing. Unlike a fanout-based approach, physically aware capacitance estimation results in a unique and accurate value for each net.
- Signoff-quality power computation engine: Traditional RTL power analysis tools using word-level logic inferencing for fast synthesis can only apply heuristic—and therefore inaccurate—methods for glitch power computation. To accurately calculate glitch power (which can potentially consume up to 40% of a chip’s total power) and reduce highly replicated tiles, RTL power analysis tools must have a signoff-quality power analysis engine, a netlist level design representation, and an integrated timing engine.
After completing RTL power analysis and reduction, physical implementation (synthesis and place and route) tools can be used to further optimize PPA. To ensure reliability, scalability, and a frictionless user experience, these implementation tools should include a single, integrated data model architecture, interleaved engines, and a unified shell. Just as importantly, implementation tools should be capable of accurately modeling advanced node effects and glitch power to accelerate engineering change orders (ECOs) and final design closure.