Insight Home | Previous Article | Next Article
Issue 4, 2011
Achieving Faster Hierarchical Design Throughput
Steve Kister, a technical marketing manager at Synopsys, reviews the implications for design planners of ever-rising levels of complexity and integration, and explains how new On-Demand Loading technology within Synopsys IC Compiler helps engineering teams to produce an accurate floorplan in less time.
The continuing increase in gate count and complexity within today’s SoCs has been well documented. A process such as IBM’s Cu-45 provides 1.48M raw gates per square millimeter (Figure 1). At this density, even a relatively small chip of 15mm x 15mm can accommodate more than 300 million gates. At 70% utilization, the design will be well over 200 million gates.
Figure 1: ASIC gate density trend
Most design teams compress their development schedules by starting physical design in parallel with logical design. They reuse design data from previous designs, employ production-proven intellectual property (IP) and apply hierarchical design methodologies.
Hierarchical design methodologies allow design teams to divide the chip into manageable pieces that can be implemented in parallel, thus saving time. However, to achieve the most throughput and time savings, they take advantage of many opportunities to save time throughout the flow as the design progresses from planning through implementation. Time-saving opportunities will be discussed later in this article.
Compressing the Physical Design Schedule
In today’s environment, physical designers start to work on a project at nearly the same time as logical designers. Generally, design teams schedule a number of early netlist hand-offs, or netlist drops, from the logical designers to the physical designers before the “final” netlist is available (Figure 2). This enables physical design teams to start exploring various implementation strategies. By the time the final netlist arrives, the physical designers have developed a detailed implementation strategy that enables them to minimize the time to tape-out. Minimizing CPU runtime while working with early netlist drops enables physical designers to explore and assess more floorplan solutions. This is critical to finding the best floorplan to ensure minimum time to tape-out and the highest quality of results when the final netlist arrives.
Another technique that teams use to compress design schedule time is reuse of earlier design blocks and use of third-party IP. It is very rare that a new chip design of billions of transistors is created completely from scratch. Generally, most of the transistors in a new design are used to form memories or functions derived from similar functions implemented in earlier designs.
Figure 2: Physical implementation schedule
Flat Design Flows
For many years, SoC designs were taped out using flat planning and implementation flows. Flat flows are not efficient for extremely large, complex designs. The CPU runtimes and memory overhead to complete trial placement and routing in the early phases of the design may take days. For planning purposes, physical designers need to complete trial runs in less than a day. At most, the trial runs should be completed in overnight batch jobs. This enables designers to analyze results and prepare new batch jobs during the day, maximizing their productivity while they explore potential implementation strategies.
Once the planning phase is complete, the design moves into a refinement phase, and then later, an ECO phase. Once routing is feasible, designers start running optimizations to shorten timing path delays as much as possible to achieve the highest operating frequency. In a flat flow, each optimization pass must process all of the design data, which results in a significant runtime and memory use overhead. When the design moves into the ECO phase, there is the risk that implementing an ECO in one functional part of the design will degrade the timing in another functional part. This could lead to a ping-pong effect where new problems are introduced each time another problem is solved.
Hierarchical Flows for Size
The process of breaking a design into physical blocks is called partitioning. Block size, in terms of instant count, is a common criterion used to partition a design. Understanding how large a block can be, while still meeting an overnight runtime criterion, enables block designers to run batch jobs overnight and spend the day assessing results.
Once a design is partitioned into blocks, the physical designers responsible for the full chip create a top-level floorplan by placing and shaping the blocks and assigning pins to their boundaries. The block shape and pin placements represent the physical constraints that are passed to the block design teams. Time budgeting is a process that top-level physical designers use to divide top-level timing constraints to create timing constraints for the blocks.
To minimize the CPU runtime and memory size requirements, top-level physical designers often use a black-box planning approach. The gate-level netlist content is eliminated from the blocks, which are sized to achieve an estimated target of required area. Timing models or assumptions relative to timing path segments within blocks must be developed. Only top-level logic, IO pad cells, macros, and empty blocks are used to form the top-level floorplan.
Black-box planning is the only option if the content of the blocks is unknown. However, as previously discussed, only a small percentage of new design content is unknown and designed from scratch. Converting blocks with known content into black boxes requires a trade-off between accuracy and throughput time. This technique maximizes throughput in terms of CPU runtime and minimizes memory requirements for processing the design data during top-level planning.
For blocks where the content is known, the trade-off introduces the risk of producing unrealistic physical and timing constraints for the blocks. Will the block shapes, based on estimated area, accommodate the various macros within the blocks? It is not uncommon for macros to have extreme aspect ratios. For example, some memories may be tall and skinny, imposing a minimum height requirement for a block shape (Figure 3). How accurate will budgeted block-timing constraints be based on the estimated timing characteristics of the blocks? Block shape errors and poor timing constraints often require multiple iterations to resolve. For blocks where the content is known, is this a good trade-off of accuracy versus throughput time?
Figure 3: Macros too tall for block size
On-Demand Loading for Planning Speed and Accuracy
When a partitioned design has many blocks with known content, designers responsible for the top-level floorplan can use this information to obtain more accurate results. For example, they could place the top-level and the gate-level content of the blocks as if the design were flat and use the resulting placement to help determine the locations and shapes of the blocks. Seeing their content “as if flat” ensures that block shapes will correctly contain their corresponding macros. While this approach has an accuracy advantage, the designer loses the significant CPU runtime benefit of a black-box approach.
There is an opportunity to trade off some accuracy for speed, enabling a better starting point for top-level floorplanning. Synopsys’ on-demand loading technology is designed to help design teams take advantage of this opportunity.
Exploration placement of the full design provides a better starting point than pure black-box placement and shaping. To minimize the runtime and memory usage, the placement algorithms are tuned to focus on producing results that drive shaping and enable accurate top-level timing assessment. Block shapes contain their macros and the algorithm produces a relatively accurate placement of interface logic at block boundaries to enable accurate assessment of top-level timing. This enables design teams to assess more potential top-level floorplan solutions in a reasonably short amount of time, while the risk of increasing the number of iterations required to reshape blocks just to fit their macro content is mitigated. The accuracy of top-level timing is better, which in turn enables a more accurate time budgeting process to produce block-level timing constraints.
A rough placement requires a global router that is able to reveal top-level routing congestion without the requirement of a full, detailed, legal placement. During design planning, the global router must produce a quick, accurate report of routing congestion to enable designers to explore and refine the floorplan solution. Synopsys has enhanced the global router with an exploration mode to meet this need. By assuming it can make connections regardless of the legality of placement, it runs fast and quickly exposes areas where routing congestion could be a problem. This information is key to adjusting channel sizes during the exploration of top-level floorplanning solutions.
When it comes to routing, it is also important to consider the power ring and mesh routing. An efficient flow enables designers to write mesh and ring requirements in terms of key design objects – such as blocks, groups of macros, and voltage areas. Synopsys has recently introduced template-based Power Network Synthesis. Construction rules such as pitch, layer widths, and spacing are stored in templates, which are associated with design objects. In this way, designers are able to quickly update power routing as needed when changes are made to shapes, sizes, or locations of key design objects.
Given a top-level floorplan, design teams then start working on block floorplanning and top-level floorplan refinement in parallel. For the top-level team, refining a black-box floorplan may be fast, but it is prone to creating situations that may not work for the block designers. Top-level designers need visibility into the interface logic of blocks to have better accuracy in top-level timing analysis (Figure 4).
Figure 4: Black boxes limit timing optimization opportunity
Visibility of the placement of macros within blocks enables better decision-making regarding changes to the shapes of blocks from the top-level (Figure 5). Top-level designers can see how changes affect block designers. On-demand loading creates these abstracts for the top-level designer to minimize CPU runtime and memory requirements for top-level floorplan assessment and refinement without sacrificing accuracy.
Figure 5: Abstracts that show hard-macro placements in blocks
When the design moves from a planning state to an ECO state, intelligent abstracts are replaced with actual block data. Tools should understand that the top-level only needs to see interface logic – not all logic within the completed blocks. Synopsys has developed transparent interface optimization technology to perform this function. Top-level timing can be closed more efficiently as IC Compiler optimizes all top-level and block segments of timing paths that go through hierarchical block pins.
The sheer scale of a modern SoC requires design teams to move away from traditional flat flows toward a more hierarchical approach to their designs. This presents challenges in terms of coordinating initial architectural exploration, top-level design, physical design and logical design. On-demand loading enables a more accurate hierarchical flow that is tuned to enable fast exploration of initial floorplan solutions while taking block content into account. Top-level floorplan refinement is better as intelligent abstracts enable designers to see complete timing paths from the top and macro content of the blocks. Final timing closure is more efficient with tools that make the interface logic of blocks transparent and are able to optimize both the top-level and block-level logic of interface paths at the same time.
- More Information:
About the Author
Steve Kister is a technical marketing manager at Synopsys where he has worked for 15 years, focusing on design planning. Kister joined Synopsys in 1995 and has been in the electronics industry for over 20 years. He received his bachelor’s degree in electrical engineering technology from DeVry Institute of Technology (Phoenix) in 1979. His experience includes test engineering, physical design, library development, and applications engineering.