Contact Sales

Search Synopsys

Innovate Faster with Synopsys Multi-Die Solution

Explore our eBook for scalable multi-die solutions to boost innovation, productivity, and success.

Automotive Executive Guide: Rethinking Automotive Development

A guide to virtualization in software-defined vehicles for automotive leaders.

Mastering AI Chip Complexity

This eBook explores AI chip design trends, challenges,
and strategies for first-pass silicon success.

Accelerating IoT Applications with a Data Fusion IP Subsystem

Rich Collins

Apr 18, 2016 / 7 min read

Table of Contents

Sensor fusion to data fusion
Advantages of an integrated subsystem
Tightly integrated DMA improves power and performance
An optimized data fusion subsystem
Smart Data Fusion IP Subsystem benchmarks
Summary

The vast expansion of Internet of Things (IoT) edge devices is increasing demand for low-power, “always-on” functions such as sensor fusion, image and voice detection, gesture recognition, and audio playback. Supporting this fusion of data sources requires an efficient combination of RISC and DSP processing.

This article shows how leveraging an integrated, pre-verified data fusion subsystem that is optimized for efficient DSP performance and ultra-low energy consumption can accelerate the development of high-performance, cost-optimized IoT systems and speed time to market.

Leveraging Synopsys ARC-V RPX-100 Processor IP for Robotics and ADAS

Learn How Synopsys ARC-V RPX-100 Processor IP Advances Robotics and ADAS With High Performance, Low Power, and RISC-V Architecture.

Download Whitepaper

Sensor fusion to data fusion

The combination of basic sensor elements into a higher order function is called sensor fusion. For example, combining the input from an accelerometer, compass and gyroscope to track 3D motion is common in all modern smart phones. The number of systems incorporating sensor fusion technology continues to explode as semiconductor suppliers push to integrate sensor interfaces into many of their SoC offerings.

In addition to sensor processing, today’s IoT applications demand more and more integrated functionality, which requires support for voice and gesture recognition, audio playback and basic image detection. A higher level of DSP processing capability is needed to perform these functions, but at the same time it must be done with the lowest energy consumption possible. Data fusion has become a standard requirement in IoT edge devices addressing applications such as wearables, personal health and fitness devices, and wireless headsets and speakers.

Advantages of an integrated subsystem

The advantages of increased integration can differentiate a silicon vendor’s device. A typical "integrated" solution today involves incorporating the various data source interfaces into a microcontroller-like architecture. This architecture, shown in Figure 1, generally includes a CPU connected through an on-chip bus to peripheral interfaces (ADC, SPI, I2C), as well as on-chip memories (ROM, RAM, eFlash). The processor is connected to a standard bus (typically AMBA based) and all of the peripherals are connected to the bus. Transactions between the processor and peripherals take three to seven clocks or more due to bus latency and traffic on the bus. This is very inefficient in terms of performance and energy consumption.

Diagram of IoT Application Acceleration Architecture

Figure 1: Discrete implementation vs. integrated subsystem

An integrated IP subsystem with a DesignWare® ARC® EM processor offers distinct advantages to help ease integration effort while reducing on-chip latency and energy consumption compared to typical bus-based systems.

ARC EM processors provide industry-leading power/performance efficiency – saving critical battery life for IoT edge devices. The ARC EM DSP processors add DSP instructions, as well as and MUL/MAC hardware, to the baseline RISC processor for always-on functions such as voice/gesture and audio playback. The availability of licensable options, like FPU, MPU, and microDMA allows customers flexibility in making implementation choices.

An ARC processor-based subsystem implementation can eliminate the interface to an on-chip bus by replacing load/store instructions to the I/O peripherals with register move instructions. The peripheral block registers are mapped using the ARC processor’s auxiliary bus. This effectively pulls the I/O peripheral interface functionality into the CPU complex, eliminating the buses and bridges. In a similar manner, both instruction and data memories can be closely coupled to the processor, eliminating the external bus and reducing access latencies.

ARC processors and subsystems also support adding any combination of hardware extensions to the core: CPU extension registers, auxiliary extension registers, or memory mapped blocks. Designers can add 32-bit custom instructions as well.

Leveraging these high-level configuration and extension concepts enables end-customers to create highly optimized implementations.

Tightly integrated DMA improves power and performance

One of the many configurable subsystem options is a tightly integrated microDMA engine. This DMA controller allows system resources and peripherals to access memory independent of the processor, even during processor sleep modes. This can translate into real savings on cycle count and dynamic power.

To quantify this value, two basic subsystems using ARC EM processors were compared: one with the tightly coupled DMA and one without. Using an integrated subsystem SPI peripheral, an 1800-byte message was transmitted in loopback mode (primary Tx -> primary Rx). The CPU was clocked at 10 MHz and the instruction and data memories (ICCM & DCCM) were 32KB each for both implementations. Effective cycle count and dynamic power were measured in each case.

Figure 2 shows the first subsystem implementation without the microDMA engine. Figure 3 shows the second implementation, which includes the tightly coupled microDMA engine.

Results are shown in Table 1 below. For a relatively small area penalty (adding the microDMA logic adds ~10K logic gates and some memory overhead), the number of required CPU cycles decreases dramatically (as expected), but the dynamic power of the subsystem is reduced 8X.

Diagram of ARC EM and SPI Master for IoT Applications

Figure 2: Subsystem with ARC EM processor and no integrated DMA

Blueprint of IoT Application Speed Enhancement Framework

Figure 3: Subsystem with ARC EM processor and tightly integrated DMA

*Total cycles:* *576K*	Subsystem without integrated DMA	Subsystem with tightly integrated DMA
*CPU cycles*	211K (37% of total cycles)	0.128K (0.02% of total cycles)
*Area (NAND equiv. gates)*	397K (47K logic/350K memories)	417K (57K logic/360K memories)
*Dynamic Power*	56µW (22.5µW CPU/33.5µW memories)	7µW (2.3µW CPU/1.0µW memories/3.7µW DMA)

Table 1: Area/Power comparison of subsystems with/without tightly coupled DMA

An optimized data fusion subsystem

Leveraging the basic subsystem concepts above, Synopsys has developed an IP subsystem targeting the fast-growing IoT edge device market – specifically addressing “always-on” applications requiring a robust level of DSP performance to process functions such as complex sensor fusion, voice and gesture recognition, image detection and audio playback while adhering to the constrained power envelope of a battery operated device.

The DesignWare Smart Data Fusion IP Subsystem (Figure 4) is designed to efficiently process data from numerous digital and analog sensors, either as the main processing element in an MCU, or as an offload engine for the host processor in a larger SoC. The fully configurable IP subsystem includes an ARC EM5D, EM7D, EM9D or EM11D processor. This family of low-power cores combines RISC and DSP instructions and hardware to manage the extensive processing required by advanced data fusion algorithms and to improve performance for a range of audio formats including MP3, SBC, OPUS and AAC LC.

The subsystem's integrated microDMA controller enables memory and peripheral access during processor sleep modes. In addition, the subsystem incorporates highly-optimized I/O peripherals including multiple SPI, I2C and analog-to-digital converter interfaces, further lowering gate count and energy consumption.

To ease software development, the subsystem includes software drivers and a rich library of off-the-shelf DSP functions supporting filtering, correlation, matrix/vector, decimation/interpolation and complex math operations. Designers can implement these sensor-specific DSP functions in hardware using a combination of native DSP instructions and tightly coupled hardware accelerators to boost performance efficiency and reduce power consumption.

Additionally, Synopsys' embARC Open Software Platform gives software developers online access to a comprehensive suite of free and open-source software that accelerates code development for the subsystem.

Diagram of IoT Application Acceleration Architecture Components

Figure 4: Synopsys DesignWare Smart Data Fusion IP Subsystem

Smart Data Fusion IP Subsystem benchmarks

For a complex sensor hub implementation, a common set of signal processing functions are typically required. These functions include complex and scalar math, matrix functions, filtering, interpolation and transforms.

To analyze the performance of the Data Fusion IP Subsystem, a library of these functions was run on both the Data Fusion Subsystem and a microcontroller running with a competitor’s processor (40LP, typical process/conditions). The total number of required clock cycles to complete the benchmark was calculated in both cases.

Across the board, the competitor processor required a significantly greater number of clock cycles to complete the tasks. The additional clock cycles translate into real energy (power over the “life” of the task). On average (as seen in Figure 5), the energy consumption was more than 2X greater for the competitor’s implementation. For IoT devices demanding minimal energy consumption to save on battery life, the Smart Data Fusion IP Subsystem provides significantly more efficient processing for typical sensor functions.

Graph Comparing IoT Data Fusion Subsystem Efficiency

Figure 5: Competitive comparison of typical fusion functions

Summary

The rapidly expanding IoT edge device market continues to push boundaries on integration, cost, and performance. Battery operation provides a constrained power envelope, but increasing demands for “always-on” functionality combining complex sensor fusion with biometric input (voice, image, touch) drive both RISC and DSP performance requirements. Designers require integration of more of these functions to eliminate board-level components and reduce cost.

Synopsys’ Smart Data Fusion IP Subsystem combines the unique capabilities of the ARC EMxD CPUs with tightly coupled peripheral interfaces and hardware accelerators along with software drivers and libraries in an integrated IP offering, providing significant gains in overall performance, while reducing software footprint, silicon area and power in embedded IoT systems.

Subscribe to the Synopsys IP Technical Bulletin

Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.

Continue Reading

Datasheet

Synopsys ARC NPX6 NPU Family for AI/Neural Processing

Download Datasheet

Webinar

Addressing Real-Time Workloads in Automotive Applications with Efficient ARC-V Processors

Webinar

Implementing High Performance Real-Time Designs Using Synopsys ARC Processor IP

ASK SYNOPSYS

BETA

End Chat

Closing this window clears your chat history and ends your session. Are you sure you want to end this chat?

Legal Disclaimer

NOTICE: You are interacting with an AI-powered chatbot that provides general information about Synopsys, including its products and services, which may be incorrect or incomplete. In the event of any conflict or discrepancy, the terms of your applicable agreements supersede any information provided by this chatbot. These chats may be accessed by Synopsys and its service providers to customize the experience and improve this tool, and your use of this chatbot is an agreement to that data processing activity.