Low-Power Processor Solutions for Always-On Devices

By Pieter van der Wolf and Joep Boonstra, Senior Staff R&D Engineers, ARC Processor IP Solutions, Synopsys

Mobile Devices on the Move: Smarter and Smaller

Increasingly, mobile devices are becoming context-aware, using a broad array of sensors to monitor location, movement, heart rate, sound, etc. These sensor inputs enable new applications that make our smart mobile devices even smarter and are changing the way users interact with the devices. Think, for example, of devices with an “always-listening” capability that can be activated by means of a voice command.

These trends are visible with popular mobile devices like smartphones and tablets, which continue to evolve to offer new sensor-based applications. A new wave of wearable devices such as fitness bands, smart watches, and glasses is also emerging. A common characteristic of these devices is that they are always on to monitor sensor inputs. This puts very stringent limits on the power that these devices are allowed to consume, as batteries are often very small. For example, the battery capacity of a fitness band is typically less than 300 mAh and needs to last for weeks.

Processors for Always-On Devices

Always-on devices typically employ a processor that is optimized for sensor processing tasks to meet the very strict power consumption requirements. In smartphones, this is a separate processor next to the main application processor, while in smaller devices, like fitness bands, it may be the only processor in the system. The tasks to be performed by the processor are a mixture of control tasks and digital signal processing (DSP) tasks.

A characteristic of always-on devices is that they operate in different modes. For example, an “always-listening” device typically deploys a detection mode in which it monitors the microphone input for someone speaking. If a voice input is detected it switches to recognition mode and applies DSP algorithms for recognizing spoken phrases. Processors in always-on systems need to support such mode switching and adjust for optimal power consumption in each mode.

For typical always-on, battery-based applications, the DesignWare® ARC® EM5D and EM7D processors include various hardware features to reduce power, or more accurately, reduce the energy needed to perform a given task.

Powerful ISA

First, let’s look at code density. Code density is important to minimize memory footprint, instruction cache miss rate and memory power dissipation. The ARC EM processor family is based on the ARCv2 instruction set architecture (ISA), which offers best-in-class code density for RISC applications. The ARC EM5D/EM7D processors extend the ISA with over 100 new instructions targeted to DSP applications, retaining the high code density of the ARC EM processors while still being able to issue one instruction per cycle. A summary of the ARCv2DSP ISA is shown in Table 1.

Table 1: ARCv2DSP ISA Summary

The ARC EM5D/EM7D processors include ARCv2DSP instructions to facilitate common DSP functions like Q15 and Q31 fractional data support, saturation, accumulation and rounding, all as fused operations. In addition to that, the processors support two styles of 16b vector multiply-accumulate operations:

  • SIMD style: acc.lo += a.lo * b.lo; acc.hi += a.hi * b.hi
  • Inner product style: acc += a.lo * b.lo + a.hi * b.hi

A special class of vector instructions supports complex data types (imaginary, real), including instructions to compute an FFT radix-2 butterfly in just two cycles, with and without scaling. To minimize data footprint and reduce data cache miss rate, the ARCv2DSP ISA supports 8b data and efficient conversion from 8b to 16b data types. Fused operations and vector support result in dense code and data with the associated reduced cycle count and consequently, lower energy to execute a given task.

Low-power hardware architecture

The ARC EM5D/EM7D processors’ DSP hardware has been defined with low power in mind. To accommodate the ARCv2DSP instructions, the baseline EM architecture has been extended with a new DSP pipeline, as illustrated in Figure 1. The DSP pipeline implements all DSP instructions as well as the RISC multiply instructions, replacing the existing ARC EM multiplier and unifying the datapaths while reducing area and power. To further reduce power, the input operands to different parts of the processor are fully gated to avoid unnecessary toggling of signals in inactive sections.

The ARC EM5D/EM7D features separate clock domains for the different modules in the processor; a module clock gets enabled only if there are active instructions in the module. On top of that, 99% of the flip-flops in the design can be gated by inferring second-level clock gates in a logic synthesis tool, such as Synopsys’ Design Compiler.

Figure 1: ARC EM5D/EM7D Processor Pipeline

Sleep modes

Many always-on devices require the processor to sleep until some event occurs. An example of such an event is the availability of a number of data samples from a microphone. To accommodate these use cases, the processor supports closely coupled memories (CCMs) that can still be accessed while the processor is asleep. An external DMA can write data into a CCM and wake up the processor as soon as sufficient data is written.

Add your own instructions

The ARC EM5D/EM7D processors can be extended with application-specific instructions using the ARC Processor EXtension (APEX) interface. This mechanism can be used to define new instructions and registers to accelerate specific applications, which reduces cycle count, memory footprint and, ultimately, energy consumption.

Programming Support for Fast Software Development

Programming tools and software libraries, which enable fast development of software that executes efficiently on the processor, are key requirements in a processor solution. The ARC MetaWare Development Toolkit for the EM5D/EM7D processors supports C programming with fixed-point data types and operator overloading. This enables high-level programming and fast migration of code from other platforms to the EM5D/EM7D processors. Examples of supported data types are Q31 and Q15 fractional data types, which are often used in DSP algorithms. The compiler performs an extensive array of optimizations, such as operator fusion, that help to reduce cycle count and code size.

Critical code segments may require low-level optimization to further reduce their cycle counts. For this purpose the MetaWare compiler supports programming at the level of intrinsics for the ARCv2 ISA DSP extensions. The use of intrinsics allows detailed control over the instructions selected by the compiler, without having to resort to assembly programming. Drawbacks of the use of intrinsics are that code becomes platform specific and that the compiler is restricted in the optimizations it may perform. Therefore the ARC MetaWare Development Toolkit also supports a library of fixed-point primitives that enables portable bit-exact optimization.

The ARC MetaWare Development Toolkit also features a rich DSP software library offering optimized implementations of commonly used DSP functions. The list of functions is summarized in Table 2. This library greatly improves productivity and time-to-market as optimized implementations of functions can be reused efficiently in DSP algorithm development. In addition, a library with efficient bit-exact implementations of the ITU-T base operations on EM5D/EM7D is also provided. This library allows the broad code base of ITU-T voice codecs to be leveraged efficiently.

Table 2: DSP Software Library Functions

Efficient software development requires execution on target platforms that offer developers the right mix of execution speed, debugging support and cycle-accuracy. As is illustrated in Figure 2, the ARC MetaWare Development Toolkit offers C++ classes for bit-accurate emulation on x86 platforms at very high speeds. Using the ARC MetaWare compiler, code can be mapped to the ARC platform for model-based simulation using the ARC nSIM simulator or for execution on ARC hardware, which can be a cycle accurate hardware model, RTL, an FPGA implementation and, of course, actual silicon.

Figure 2: Software Libraries and Tools for Software Development

Conclusion

Mobile always-on devices have very stringent requirements on power consumption and require processor solutions that are optimized for mixed control and DSP processing at very low power. The ARC EM5D/EM7D processors are targeted for use in always-on devices and address the need for low power at all levels, including the ISA, the hardware architecture, sleep modes, extensibility support, optimized libraries and an optimizing DSP compiler.