When looking at optimizing a core in terms of performance, size, and power consumption, data memory interfaces are key. The data memory interface (load/store units) defines the amount of data loaded and stored and the frequency of these operations. Also, these units are quite large in terms of physical implementation. The ability to optimize this interface offers an advantage by giving designers the ability to balance power consumption and area against performance requirements.
The EM9D processor has a fully configurable data memory interface, supporting from one to three closely coupled data memories (DCCM, XCCM, and YCCM). These memory regions are fully supported by the MetaWare Compiler, which eliminates the need for manual data vector allocation. These memory accesses are supported with fused instructions and allow operation computation execution and parallel access to three memory regions all in one cycle, offering very high performance if needed. The configurability allows the SoC developer to tune the core memory interface to meet the computation throughput, area, and power requirements. For example, configuring the EM9D with three physical data memories will offer three times the computation performance, with a reduction of core/memory power consumption by up to 40%.
Along with data memory size and configuration, the instruction memory size is also another important factor affecting system area and power consumption. The EM9D processor offers around 15% to 20% smaller code size than competitive processors, out of the box. This is due to the highly efficient ARCv2DSP ISA, coupled with the efficient mapping of the instructions and scheduling by the compiler. On top of this, the fused instructions significantly reduce code size, and hence the required instruction memory size.
In addition to optimizing the core and memories, SoC system integration of the DSP is important for optimal performance, power, and area. End-node IoT SoCs can range from quite simple to highly complex and sometimes the traditional modular SoC interconnect system adds gate count, milliwatts, and cycle budget overhead that can also be optimized. Synopsys’ ARC processors are fully configurable and extensible, and offer the widest range of system and hardware connectivity schemes of available IP processor cores in the industry.
Peripheral hardware blocks can be connected to the processor via a dedicated peripheral interface for a ‘bus-less’ design that enables zero latency for data throughput intensive blocks. The core register bank can be extended in size and hardware blocks can directly connect to these registers, allowing core software control/status update of these hardware blocks. In addition, by using ARC Processor EXtension (APEX) technology, designers can add custom registers and interfaces in the form of an RTL description to the ISA. These connection schemes give SoC developers further flexibility to once again tune the system architecture to meet performance, power, and area goals.
To further optimize performance, an optional µDMA controller can be added to the processor. This µDMA engine is controlled directly from the ARC EM9D processor, but operates in parallel to core execution offloading heavy data movement.
Figure 2 shows an example of how this system architecture optimization can greatly improve performance, power consumption, and area.