Insight Home | Previous Article | Next Article
Issue 4, 2011
Reducing Power with Advanced Synthesis
Mary Ann White, Synopsys, outlines some of the new advanced power optimization techniques available in Synopsys Power Compiler™, which enables design teams to significantly reduce dynamic and leakage power in their designs.
Reducing power has become one of the most critical challenges for chip design teams across all markets and applications. Design teams can address dynamic power and leakage power by using advanced optimization techniques within their synthesis flows. This article outlines some of the recent additions to power optimization techniques that are available with Power Compiler.
Power Compiler Overview
Synopsys' Power Compiler adds complete power optimization capabilities to Design Compiler. Together, the two products concurrently optimize power, timing, area and test to achieve the lowest power consumption while meeting performance and area targets. A large variety of design and optimization techniques can be used to achieve significant power savings either with the device in operation (dynamic) or standby (leakage).
Power Compiler enables design teams to reduce dynamic power by using advanced clock gating and implementation of multiple voltage techniques. Examples of multi-voltage techniques to mitigate dynamic power consumption include setting supplies for less critical areas of the design at a lower voltage or shutting down certain areas of a design as needed. A good example of this latter technique would be a multi-function smart-phone where a still or video camera can be turned on as needed, which increases battery savings. Power Compiler automates the implementation of advanced low-power techniques for multi-voltage designs that comply with the IEEE 1801Unified Power Format (UPF) standard.
To help design teams reduce leakage power, Power Compiler offers concurrent synthesis optimization for leakage, timing corners, and multi-threshold voltage libraries with the ability to limit the use of low-voltage threshold cells. Low-voltage threshold cells can be good for timing, but are not necessarily helpful when it comes to reducing power because they are very leaky. Power Compiler can also take advantage of channel length cell variants that are available in some vendor libraries for optimum leakage results.
Advanced Clock Gating Techniques for Optimal Dynamic Power Savings
Power Compiler supports a wide range of advanced clock gating techniques, including:
- Power-driven clock gating
- Multi-stage balancing of the clock-gating structures
- Multi-stage clock gating
- Latency-driven clock gating
- Instance-specific clock gating
- XOR self-gating
This article will highlight usage of the last two clock-gating techniques from the list above, which have recently been added to Power Compiler. For more information on the other clock gating techniques, please refer to the Power Compiler User Guide.
Design teams can perform clock gating during synthesis on either an RTL or gate-level netlist, which ensures that they can incorporate power savings in their current designs, legacy blocks and soft macros.
Figure 1 shows two circuits – one with and one without clock gating. A synchronized enable condition allows the register bank to receive either new data from D_IN, or recycled data, depending on the condition of the enable line. But in each of these conditions, the clock continues to toggle the register every time, which dissipates dynamic power. When clock gating is added, if the enable condition is not on, then the register bank is not clocked, which saves power.
The clock-gating cell shown in Figure 1 is an integrated clock-gating (ICG) cell. Compared with using discrete AND gates, ICG cells save power are more area-efficient and are less likely to cause clock-skew problems.
Figure 1: Clock-gating concepts
Dealing with Pre-Instantiated and Power Compiler Inserted Clock Gates
Most RTL designers will do their clock gating at the top level or across multiple modules straight into the RTL. However, the user often has to insert more clock gating at the local hierarchy or block level.
After doing the insertion, the user may want to merge both the instantiated and the tool-inserted clock gates to optimize the overall number of clock-gating cells in the design. Designers should insert the clock-gating cells at the local hierarchy first. Pre-existing clock gates in the design can be identified if they follow the clock-gating style used by Power Compiler. The pre-existing and inserted clock gates are then merged together, effectively producing a global clock-gating structure. This approach saves the power and area of any redundant clock-gating cells, and it helps to create a balanced structure for clock-tree synthesis. For example, the merge may result in a single-stage structure with only one clock gated delay from the top level to the register bank.
Inserting Clock Gates for Special Cases
To manage clock gating properly for the cases shown in Figures 2A and 2B, a “size only” attribute can be applied to the register bank (see Figure 2C) – synthesis will maintain the register banks and insert clock-gating cells.
Figure 2: How to insert clock gating for special cases
Other new features in Power Compiler give designers more control over instance-specific clock gating. Users can implement unique clock-gating cells for a specified instance and specify multiple clock-gating styles for each selected instance.
In the past, designers had to run multiple compiles because they could only implement one type of clock-gating cell at a time for the entire design. Now, designers can implement instance-specific clock gating, with a unique style for different register banks during a single compilation.
As well as enhancing design productivity, the new features can improve both power and area savings. Figure 3 illustrates average clock gating power savings of 40% and area savings of 10% across a range of 65-nanometer (nm) through 40nm customer designs.
Figure 3: Typical dynamic power savings across a range of 65nm through 40nm customer designs
XOR Self-Gating Provides Additional Dynamic Power Savings XOR self-gating saves power by sharing clock gates between registers. This new Power Compiler feature compares the D and Q pins of a register. If they are the same, then the register will not be clocked. Figure 4 shows the insertion of a self-gating cell and the XOR gate that generates the enable signal.
Figure 4: XOR self-gating diagram
For register banks, Power Compiler uses register proximity to decide whether registers should share one clock-gating cell. The tool will share the ICG for the register bank using individual XOR gates feeding into an OR tree, resulting in a single enable condition into the ICG.
ICG insertion considers the impact to the worst-case negative slack (WNS). If the logic is on a timing-critical path, XOR self-gating will not be inserted. Additionally, if the optimization will not save any appreciable dynamic power, it may not be worth the added cost in terms of area. Power Compiler can use actual switching activity, available from most simulation tools, to estimate the potential savings in dynamic power.
Leakage Optimization for Achieving Static Power Savings
Typically, foundries require chips to limit the use of Low Vt (LVt) cells in the design. In addition, library leakage data may not be reliable – especially at lower geometries. Multi-corner, multi-mode (MCMM) power optimization can be used in some cases, but designers may not have access to the appropriate process corner libraries.
What can design teams do to optimize for leakage in their designs in such cases? The solution is to limit the number of LVt cells used in design. Use of a %LVt flow available in Design Compiler and IC Compiler can help mitigate these challenges. %LVt limits can be specified as a hard or soft constraint. A hard constraint gives a higher priority to power savings over timing, whereas a soft constraint gives timing a higher priority, so the tool will try to meet the WNS before saving power on non-critical paths.
A complication of using %LVt flow optimization can be cell availability. For example, a design team may have access to three multi-Vt libraries: Low Vt (LVt), Standard Vt (SVt) and High Vt (HVt). The LVt library may have more cells than the SVt and HVt libraries. In the rare case above, if the design team attempts to run the %LVt flow, Power Compiler will run some checks at start-up and alert the user if it doesn’t find a matching set of libraries across the whole multi-Vt set. For example, if the user has specified a group with a Vt cell variant that is not available, Power Compiler will inform the user that the Vt cell does not have its equivalent cells in all Vt groups. The leakage optimization will proceed to use the faster Vt cell if the user has used a soft %LVt constraint. In a case where the user has enabled a hard %LVt constraint, Power Compiler will filter out the LVt cell and not use it all in the design.
Figure 5 shows leakage power savings of between 20% and 70% for a range of customer designs using a multi-Vt flow, alongside the impact on WNS. The goal is to have zero WNS impact for the critical paths. In general, there is about 1-3% slack left because Power Compiler optimizes critical paths for timing concurrently with leakage.
Figure 5: Leakage optimization power savings and impact on WNS
Channel Length Variants Provide More Options
Channel length variants in libraries are becoming more popular. Library vendors are creating variations of cells with different channel lengths within each cell. Generally, HVt libraries are better for power and worse for timing, while LVt libraries are much better for timing, but are very leaky. With the availability of libraries containing multiple channel lengths, it might be possible to achieve better timing and lower leakage with a SVt cell with a longer channel than an HVt cell with standard channel length, for example. As shown in Figure 6, a longer length SVt cell would provide approximately 30% higher performance and 40% lower leakage than a standard length HVt cell at 40nm.
Figure 6: Channel length variation trade-offs at 40nm
Many library vendors, including Synopsys, provide libraries for use within optimization tools like Power Compiler and IC Compiler. Long-channel library variants can help to mitigate cost by replacing HVt cells with long-channel SVt cells, or the use of LVt cells with lesser leakage penalty.
Depending on the end design goal, a minimum number of channel length variants should be used during synthesis with Power Compiler. In IC Compiler, the rich set of all of the various channel length libraries should be used to achieve even more power savings for use with the final-stage leakage recovery flow.
MCMM Throughout for Optimum Results
Today’s complex designs need to operate under multiple conditions and in many modes such as scan, sleep and other functional modes. Optimizing in a serial process across each mode and several operating conditions (corners) can be time-consuming and requires multiple iterations to achieve optimal results. Enabled by Power Compiler, RTL designers can use Design Compiler Graphical to analyze and optimize designs across multiple modes and corners concurrently to provide additional leakage savings and a better starting point for place-and-route products, such as IC Compiler.
A corner is defined as a set of libraries characterized for process, voltage and temperature variations. Corners are not dependent on functional settings; they are meant to capture variations in the manufacturing process, along with expected variations in the voltage and temperature of the environment in which the design will operate. A mode is defined by a unique set of clocks, supply voltages, and timing constraints in similar operating conditions. It can also have annotation data, such as SDF or parasitic files.
MCMM optimization is useful for designs that can operate in many modes, such as test mode, low-power active mode, stand-by mode and so on. Used along with specification of power intent in UPF, it serves as the key enabling technology for performing dynamic voltage and frequency scaling (DVFS) design implementation.
Given the huge size of designs these days, design teams need to perform MCMM on hierarchical designs. They may have already optimized some of their lower-level legacy blocks for MCMM. Power Compiler allows design teams to integrate their legacy blocks with the top-level, matching names, so that they can then run a top-level synthesis.
Power analysis is something that all design teams care about. At the end of the optimization, the team will produce a power report which analyzes the power in the design. Having realistic switching activity, using vectors or switching activity interface format (SAIF) files, aids this analysis. The design team can generate switching activity using the RTL design with a simulator like VCS, which produces a SAIF file. Power Compiler annotates the switching activity on the design and computes more accurate power analysis.
During the synthesis process, the RTL design may change its node names, so it needs to be able to annotate the SAIF file with the correct node in the gate-level netlist. Power Compiler uses a SAIF map to automatically keep track of node names in the gate level netlist with respect to RTL SAIF, for use in other products such as PrimeTime.
Power Reduction Results
This article has outlined some of the latest techniques available in advanced low power optimization flows using Synopsys' Power Compiler. By focusing on making incremental power reductions by applying multiple power reduction strategies at the gate- and RT-level, design teams can achieve significant overall power savings.
Figure 6 summarizes the results for some of the power reduction techniques discussed. Across a range of designs, advanced clock gating achieved an average of 40% dynamic power savings, XOR self-gating saved an additional 5%, using a multi-Vt flow saved between 20% and 70% on leakage, and MCMM a further 10%.
Figure 7: Various power savings achieved by Power Compiler
- More Information:
About the Author
Mary Ann White
Mary Ann White is product marketing director for Galaxy low power implementation products at Synopsys. White has more than 25 years of experience of working in the EDA and semiconductor industries. White has a BS EECS degree from UC Berkeley.