Understanding Your Power Profile from RTL to Gate-level Implementation

Introduction

Switching activity information from RTL simulations can be used to optimize the design for dynamic power during synthesis. However, switching activity in RTL and gate-level simulations can show wide power profile variations in the design. This article describes how to optimize for dynamic power with switching activity information and how to identify and reduce differences in the RTL and gate-level-based switching activity power profiles after synthesis.

Switching Activity for Dynamic Power Optimization

High switching activity in a design causes an increase in overall dynamic power consumption. Therefore, it is necessary to apply design techniques and best practices that greatly reduce the switching activity. To accurately optimize the switching activity, we need to account for the most realistic power mode that generates the switching activity.

The DesignWare® minPower Components offer unique, power-optimized datapath architectures that enable the Synopsys DC Ultra™ to automatically generate circuits that suppress switching activity and glitches, reducing both dynamic and leakage power. minPower leverages the information in a SAIF file to perform various dynamic power optimizations while generating the netlist, as shown in Figure 1.

Consider an addition operation that can be implemented with full adders (FA) in 3 or 4 stages. The 3-stage adder structure may be good for timing, but not for power if one of the primary inputs to the adder structure is toggling a great deal. The toggles are propagated downstream, increasing the overall dynamic power. If minPower finds that adders are in timing paths that have enough positive slack to implement the adder in 4 stages, then the alternative structure with the high activity input routed to the very last stage is considered, minimizing unnecessary toggles and reducing dynamic power. In addition to optimizing the datapath architectures to reduce switching activity, minPower can automatically infer datapath gating enabling logic and its toggle rates to determine the cases where datapath gating saves dynamic power and implement the datapath gating logic only in optimal cases.

Figure 1: Example of dynamic power optimization with minPower

Another technique for dynamic power optimization is XOR self-gated clock gating. Synopsys’ Power Compiler can insert XOR style gating logic to gate the clocks when the activity seen at the register output is low. This optimization can be turned on using –self_gating option with compile_ultra.

Figure 2: Example of dynamic power optimization with Power Compiler XOR self-gated clock gating

Power Compiler also inserts clock gates in the netlist whenever synchronous load-enable conditions are found in the RTL, and the –gate_clock option is used with compile_ultra. Unlike XOR self-gated clock gating, the insertion of clock gates is not dependent on the activity of the gating-enable signal. However, clock gating reduces the dynamic power considerably by shutting off the clocks driving the downstream logic. Clock gating also saves additional power and area by reducing the need for mux logic at the inputs, as illustrated in Figure 3.

Example of dynamic power optimization with Power Compiler clock gating

Figure 3: Example of dynamic power optimization with Power Compiler clock gating

It is important to consider these dynamic power optimization techniques during synthesis. The key is to generate a good SAIF file for synthesis since most dynamic optimizations depend on the switching activity.

Generating a SAIF File for Synthesis

SAIF file can be generated by doing RTL simulations (e.g., using VCS) in one of two ways:

  1. Directly write-out a SAIF file from RTL simulation
  2. Convert a VCD file from the simulation to a SAIF using the vcd2saif utility

The first method is preferred because the SAIF file created from vcd2saif conversion will not contain state-dependent, path-dependent (SDPD) information, leading to inaccuracies in power calculations.

Note that careful consideration is needed to pick the right RTL simulations which represent the power mode being optimized. Today, a device such as a smartphone has many modes of operation (e.g., standby, phone, media playback, gaming, and internet browsing). The switching activity of the different subsystems in the chip may be totally different based on the mode of operation. Designers need to understand which RTL simulation most accurately mimics the switching activity for the most dominant power mode that is being optimized. To optimize a datapath block when in active mode, the simulation and the resulting switching activity should reflect the fact that indeed the block is active most of the time. If there is more than one SAIF file, you can use the merge_saif command to merge the SAIF files and feed the merged SAIF file to DC Ultra.

Differences in Power Profile after Synthesis

The power profile and switching activity after synthesis will be considerably different from that of the RTL, especially due to datapath gating and clock gating, which significantly reduce the dynamic power of the design. In addition, retiming during synthesis can also change the power profile, since retiming involves moving the pipeline registers into the datapath which could increase the number of sequential cells in the design.

Other factors contribute to the power profile difference as well. The RTL SAIF provided to DC Ultra only has the input ports and the registers annotated. The intermediate logic switching activity is nonexistent as the gates are not yet present. The switching activities of the downstream registers are based purely on the input vectors and the logic function preceding the registers.

For instance, consider the scenario in Figure 4 where the multiplier and adder are surrounded by 3 registers, RegA, RegB, and RegC. In the RTL SAIF, apart from the block’s I/O activities, only the activities at RegA, RegB, and RegC are annotated. The activity of the net connected to the data input of RegB is purely determined by the activity of the net connected to the output pin of RegA and the multiplier logic function. It does not depend on the type of multiplier implemented since there is no gate level information yet. Similarly, the activity of the net connected to the data input of RegC is determined purely by the activity of the net connected to the output pin of RegB and the adder logic function. It does not depend on the type of adder implemented.

Figure 4: Switching activity propagation

When an RTL SAIF is provided, minPower optimizes the multiplier and adder architectures so that the resulting netlist has minimal transitions within the multiplier and adder structures and ultimately reduces the toggles seen at the inputs of RegB and RegC when glitching is taken into account.

The SAIF file contains information about static probability and toggle rates of the nets. It does not contain any information about the simulation vectors used to generate the activities. To compute power, Power Compiler executes a 0-delay simulation using a set of simulation vectors that it generates. The simulation vectors are generated so the average toggle count and static probability at the register inputs and outputs match what is in the RTL SAIF, but there is no guarantee that the simulation vectors that Power Compiler uses are the same as the vectors used in the RTL simulation when the SAIF was generated. For instance, a state machine may go through one set of states in the RTL simulation and a different set of states in the 0-delay simulation in Power Compiler, even if the static probability and toggle rates are the same. In another example, the data signals which have a sign extension for the upper order bits (making them all 0 or 1 in RTL simulation) may have a different set of switching activity per the Power Compiler simulation vectors. In other words, information about correlating signals is lost in SAIF.

To summarize, these optimizations, including retiming, clock gating, datapath gating, and architecture selection, all contribute to a difference in the power profile after synthesis.

Minimizing Power Profile Changes

While the differences in the RTL and gate-level power profile are unavoidable, designers can minimize the differences by referring to the following best practices: 

Best Practice Reasoning
1 Ensure RTL simulation exercises the most dominant power mode for which you are optimizing minPower will optimize this mode
2 Merge multiple SAIF files Helps optimize for multiple modes (if needed)
3 Ensure RTL simulations are run at the same frequency as synthesis Frequency of RTL simulation may affect toggle rates depending on how the testbench is written
4 Ensure that RTL simulations, synthesis and gate level simulations use the same test vectors A difference in the test vectors will change the switching activity and result in a difference in the power calculations
5 Ensure that all I/Os and registers are annotated in RTL SAIF Default annotation from DC Ultra (e.g., 10% at I/Os) may be inaccurate for your design

Conclusion

Switching activity information from RTL simulations can be used to optimize the design for dynamic power. However, the design power profile during RTL simulation will be different after synthesis. While this difference cannot be completely eliminated, the differences can be reduced by following certain techniques and best practices. Designers must check the consistency of the inputs and behaviors that contribute to large changes in the power profile. A full gate-level simulation with the right test vector is needed to accurately model the power profile of the design.