ARC Zone: Addressing Embedded Challenges with Superscalar Mixed-signal Processors

By: Michael Thompson, Product Marketing Manager for Processors, Synopsys

Introduction

Our relationship with electronics is becoming more seamless, enabling us to be more efficient and productive. Our connection with the digital realm is getting closer because of advances in processes, processors and embedded technology. This is not without issue for embedded designers who face a number of challenges when designing new products. These include clock speeds and memory access times that aren’t increasing even though performance requirements continue to go up; power budgets that remain the same or decline while application functionality increases; and a growing need for mixed-signal processing in a broad range of embedded applications. Efficiently addressing these challenges requires a new class of embedded processors that deliver very high levels of performance, but are designed to do so while efficiently balancing power requirements and supporting the mix of RISC and signal processing capabilities that are becoming essential for many embedded applications.

Embedded Design Challenges

Gone are the days when you could count on the next process node to give you twice the clock speed at half the power consumption. Clock speeds for most embedded designs have topped out in the 1 GHz to 2 GHz range as can be seen in Figure 1.  While there is still a small increase in speed, our ability to just increase the clock frequency to attain higher performance is limited due to power and process limitations. This is creating challenges for embedded designers, because the performance demands for applications continue to increase. 

Figure 1: Historical growth of processor performance (source: researchgate.net)

Figure 1: Historical growth of processor performance (source: researchgate.net)

This challenge is further exacerbated by the growing memory performance gap (Figure 2). As we move down the process curve, logic speeds (red line) are increasing at a much faster rate than the memory access times (blue line). For example, in the 28-nm process node, the logic can be clocked at more than 3 GHz but the memory access speeds are limited to 1.4 GHz under the best conditions. As can be seen in Figure 2, memory access times have pretty much flat lined.

Figure 2: Embedded memory performance gap (source: semiwiki.com)

Figure 2: Embedded memory performance gap (source: semiwiki.com)

The memory access time can limit the maximum speed that a processor can be clocked at because a processor can’t run faster than it can access memory.

Clock speeds in embedded designs are also being moderated to manage power consumption. Especially in battery-powered applications, power budgets are fixed or only growing slightly while requirements for performance, functionality and features are increasing. Power budgets are even being limited in applications where power consumption would not seem to be a concern. For example, in cars, where power from the alternator is substantial, power on each module is limited to control the overall power drain as electronic components in cars proliferate. The power consumption design challenge for embedded applications is not new, but it is getting more difficult to manage as designs become increasingly complex.

Advanced processors are being called on to address these embedded challenges, but even the processors are being challenged to deliver more. In the past, if signal processing was required in a design, a DSP co-processor would be added, but now, to increase processing efficiency, the co-processor functionality is being pushed into the RISC processor. This merging of functionality reduces the number of processors in the design, which saves power, but also puts pressure on performance because the RISC processor is now required to do multiple tasks. 

Addressing the Challenges

These challenges are daunting but the capabilities being offered in new embedded processors will help designers to deal with them. While the clock speeds of embedded designs aren’t increasing, the performance continues to increase because the latest embedded processors can support more instructions per clock. Adding the capability to issue and execute multiple instructions in parallel, or multithreading, will increase processor performance without increasing frequency. Another approach is to use multi-core processors in either a symmetric or asymmetric configuration. These approaches enable more work to be done in parallel, increasing performance and throughput.

However, increasing the work done per clock doesn’t address memory access limitations. The increasing gap between memory access speeds and logic speeds is most profound for processors that allow only one stage in their pipeline to access memory. In 28-nm processes, memory access speeds will limit the best-case maximum clock speed of processors to just over 1 GHz or less. Processors with single-cycle memory access have few options to overcome the clock speed limits. Newer high-performance embedded processors offer two or more cycles of memory access so memories can be banked and accessed in parallel.  With two-cycle memory access, a processor can be run at twice the speed of the memory and achieve much higher maximum clock speeds at all process nodes, including the newer advanced nodes.

Unfortunately, increasing processor performance through increasing instructions per clock, using multi-core processors or running the processors at higher speeds taking advantage of multi-cycle memory access will burn more power, which is a problem for designs with constrained power budgets. Designers of embedded processors can no longer throw transistors at the problem of increasing performance and throughput as has been done in the past. Any increases in performance have to be balanced against the increase in power consumption that is a natural result. Therefore, embedded processors are now being measured in terms of performance efficiency instead of straight performance or power. Measured in terms of performance per microwatt (DMIPS/mW, CoreMark/mW, etc.), performance efficiency has to be considered as a key design metric for any new embedded processor. Careful balancing of performance efficiency enables embedded application designers to take advantage of increases in processor performance while limiting the increases in power consumption.

Of course, performance-efficiency is not the only thing being done to control power consumption. New embedded processors give the designer much greater control over how the processor uses power. The ability to create power islands and exercise dynamic control over power consumption in the processor helps designers meet their system-on-chip’s (SoC’s) power consumption targets. Significant strides are being made in improving instruction sets and compilers to improve embedded code density. Saving 10% or more in embedded code size will reduce memory requirements and save, in many cases, more power than the processor uses. 

New Superscalar ARC HS4x Family

The widely deployed DesignWare® ARC® HS3x family of high-performance processors has been available since 2013, and the design challenges have grown since then. To help designers address these emerging challenges, Synopsys has introduced the new ARC HS4x/D family. The new family has five members (HS44, HS45D, HS46, HS47D, and HS48) and features a dual-issue pipeline that has been optimized for embedded applications (Figure 3). As a result, the mixed-signal HS4x/D family increases RISC performance by 25% and doubles signal processing performance over the HS3x family, but does so while only increasing power consumption and area by 15%. The new family is fully compatible with the HS3x family and offers two-cycle memory access, enabling the cores to be clocked at up to 2.2 GHz on 28-nm processes. The HS45D and HS47D processors support 150 DSP instructions and deliver very high levels of combined RISC and DSP performance. To make the new HS4x cores easy to use, both the RISC and DSP capabilities can be efficiently programmed in C/C++ with Synopsys’ ARC MetaWare Compiler that automatically takes advantage of the dual-issue capability of the processors to maximize performance. 

Figure 3: New ARC HS4x Embedded Processor Family

Figure 3: New ARC HS4x Embedded Processor Family

Conclusion

Times are changing and bringing very interesting capabilities to the electronic world around us. Advances in technology will enable increasingly seamless and natural connections into this digital realm, resulting in greater efficiency, productivity and connection with others. These advances bring with them challenges for embedded designers that require new approaches in dealing with the increasing performance requirements and functionality, and balancing these against the ever-present power limitations. Successfully addressing these challenges and realizing this new class of electronics will require advances in embedded processors like those offered by Synopsys’ new HS4x/D family which delivers the required performance and functionality, but does so with an eye on performance-efficiency so it doesn’t blow up the power budget.