Insight Home | Previous Article | Next Article
Issue 3, 2011
Optimizing Processor Cores – What You Need to Know
Jonathan Young and Brian Machesney, Synopsys, highlight key tradeoffs and their impact when undertaking one of the major tasks in any SoC project – optimizing the processor core implementation.
Once you have selected the processor core for your next chip, you will face the challenge of implementing it in silicon. Whether you decide to optimize the processor implementation in-house, outsource the optimization to a third-party, or use a pre-hardened core, it is important to be sure that your processor performance, power and area (PPA) goals are realistic and achievable within your schedule and with the available resources.
Unfortunately, sometimes the PPA figures of processor cores quoted publicly can be confusing or lead you to impractical expectations. Whether the results live up to the published values depends on a wide range of factors. Comparing vendors' performance numbers is difficult when processors are not configured identically, implemented in the same process technology, or measured under the same operating and design conditions. To manage risk and protect your investment in high-value processor IP, it pays to understand what is behind the numbers.
Soft and Hard Cores
Most SoC design teams today will use either soft or hard cores for their processor IP. While cores implemented with a full- or semi-custom design flow may offer the highest performance, they also require the highest effort to implement for most projects (Table 1). Soft cores give chip developers the freedom to choose the configuration (e.g., cache size), design features (e.g., power management, test), process, standard cell libraries, and embedded memories that best suits their needs, but require the full implementation effort. Hard cores are pre-configured and pre-implemented versions of soft IP, which typically results in an optimized implementation but limits the developer's options for configuration, process technology, etc.
Table 1: Comparing soft, hard and full-custom cores
Whatever implementation option you select, the results you get will depend on a number of important factors, including the underlying standard cell and physical IP, the implementation tools and flow, the experience and expertise of the design team (not only at chip implementation, but also with the specific core and target technology and physical IP), and how much time and effort is spent on optimizing the design (Figure 1).
Figure 1: Achieving the highest-possible PPA from a processor core follows the law of diminishing returns
- Implementation Factors that Impact Success
Success, both in selecting and integrating a processor core, depends on having a good understanding of the following issues:
- Configuration options
- Core feature set
- Cache memories and features
- Operating conditions
- Process technology
- Library features
- Design signoff criteria
- Ease of integration
The way you configure a processor core will have a significant impact on its PPA figures. Generally speaking, adding options increases area (and possibly congestion) and creates longer routes that limit the clock frequency.
Features that vendors often make configurable include the number of CPUs, auxiliary logic such as floating-point units, instruction-set extensions and other special acceleration units, cache memory, interrupt and debug logic, test logic and power management features.
The effect on PPA of implementing all, some, or none of these features can be dramatic. In order to draw meaningful comparisons from published processor performance results, you need to know whether the quoted figures include the optional features that your application requires.
Voltage and temperature are the two operating conditions of most interest for design teams optimizing processor cores, as they strongly affect clock frequency and power consumption. Because the relationships between speed and power versus voltage and temperature are complex for nanometer-scale transistors, it is essential to get a firm handle on all of the conditions – as well as the design signoff criteria used – for any quoted measurement.
"Overdrive" voltages beyond the traditional nominal +10% limit can help boost performance but increase power consumption. Some design teams implement dynamic voltage-frequency scaling to respond to increases in processing demand while mitigating the increased power consumption and decreased product life associated with a fixed higher voltage. To properly interpret a performance claim, it is important to know if it is the result of overdriving the supply voltage, and by how much.
- Process Technology
Foundries offer design teams numerous technology permutations so that they can get the best out of the latest silicon processes. For example:
- Gate-oxide thicknesses and junction doping profiles to control threshold voltage and support interfaces to legacy products
- High-k gate oxide and metal gates to reduce device leakage power without impairing drive strength
- Standard cell libraries with multiple threshold voltages and channel lengths to balance performance improvements, power savings and the effects of on-chip variation
- The number, thickness and pitch of metal wiring layers that can be optimized to balance wire load and crosstalk delays with the cost of routing layers added to recover area increases
- When choosing an optimized CPU core or an optimization approach, familiarity with the underlying process technology is critically important because it:
- Affects wafer cost
- Affects clock frequency, power consumption and area
- May influence the availability of a standard-cell library that will be used to implement the rest of the SoC's function
Standard Cell Libraries
You should make a point of identifying the standard cell library used to implement a soft processor. There are five key library variables that influence PPA: track height, the availability of multi-Vt and multi-channel length cells, multi-voltage capabilities and power management kits, IP-specific cells, and operating range.
As long as they use the same underlying process technology, you can use different libraries to implement the processor and your application-specific logic. This allows high-performance standard cells to be used for high-performance blocks and high-density cells for other blocks.
Embedded Cache Memories
Cache memory is often in the critical timing path that limits the processor's maximum clock frequency. Adding test and repair features reduces manufacturing costs and improves quality but adds area and performance overhead. Be sure to quantify these effects.
Low-power modes for cache memories should be compatible with the processor core's power-saving strategy. Together with low-voltage modes that reduce standby power, clock gating and sleep modes may be used to control active power.
Design Signoff Criteria
Because PPA values are strongly linked to process-voltage-temperature parameters, (PVT corners), it is important to understand whether quoted PPA values are representative of your product's operating and process corners.
Foundries supply process information in extraction decks that EDA tools use to calculate the chip's internal timing. A valid comparison between advertised PPA metrics depends on having matching extraction decks and the same PVT corners.
Design teams make extensive use of static timing analysis (STA) to establish chip performance. It's vital that STA results correlate with real silicon performance, which you can ensure by accounting for clock jitter, establishing guard bands for the setup and hold times of clocked signals, and applying realistic on-chip variation levels defined by the target foundry.
Whether implementing a soft core in-house, outsourcing soft-core optimization, or purchasing a hard core, the results must be integrated into the full-chip development environment. The design team must perform chip-level design, functional and power simulation, functional and formal verification, physical implementation, timing, physical verification, test and signoff in the most efficient manner possible in order to meet time-to-market goals. Regardless of the source of your design deliverables, compare them to the "views" your design team needs to complete the project.
- Key Questions to Help You Choose the Right Processor Implementation
Thoroughly evaluating a processor core implementation takes time, effort, and considerable expertise. To make an informed choice, look beyond the numbers on vendors' datasheets and get answers to key questions, such as:
- How many CPU cores and how were they configured (e.g. floating-point logic, cache sizes)?
- Are the published results from silicon or simulation?
- What are the operating conditions for each performance and power data point (e.g. are they worst-case results or typical?)
- Was overdrive voltage used?
- Is the design manufacturable with high yield?
- Has power (and possibly reliability) been compromised to achieve performance (e.g. high levels of low-Vt cells used)?
Getting the complete picture on published performance metrics will help you maximize the return on your IP investments and increase your confidence that your product results will match design expectations.
- More Information
About the Authors
Jonathan Young is a director of consulting with Synopsys Professional Services. With more than 20 years' experience in the Semiconductor industry, 11 of them spent with Synopsys, he has been responsible for more than 100 SoCs reaching silicon. His current areas of focus include integration techniques for multiple IP-based designs and optimal design flows for CPU hardening. He holds a Bachelors degree in electrical and electronic engineering from the University of Reading, UK
Brian Machesney has held a wide range of semiconductor industry positions over the past three decades in technology development, product design and marketing, from startups to Fortune 500 companies. He is presently a senior product marketing manager at Synopsys, focused on chip design, verification and implementation consulting services.