DesignWare Technical Bulletin 

Efficient Audio Processing with DesignWare ARC Audio Processors 

By Henk Hamoen, senior product marketing manager, Synopsys, Inc.

Software plays an increasingly important role in enabling design teams to deliver high-quality audio solutions for consumer audio products. Henk Hamoen, senior product marketing manager, Synopsys, highlights the key benefits of Synopsys’ DesignWare SoundWave Audio Subsystem, which enables designers to benefit from efficient 32-bit ARC™ processors and optimized MPEG-4 software codecs for audio processing.

Audio processing is required everywhere!
Almost every single consumer electronics device today, ranging from smart TVs and tablets to Blu-Ray disc™ and digital video cameras, have embedded audio processing functions. These devices must support many different multi-channel, high definition, audio formats for broadcast, file playback, internet streaming, and recording capabilities. Designers making systems-on-chips (SoCs) for these applications need to architect their systems such that audio processing is implemented in the most efficient way.

Offloading to efficient audio processors
Offloading audio processing from a host processor in the system to more efficient audio processors is common practice these days. Tasks like audio encoding and decoding, as well as post-processing can be handled more efficiently by processors that have been specifically designed for this. We typically see an 80% reduction in power consumption when offloading an audio function from the host. For example offloading MPEG-2 Audio Layer III, or MP3, decoding from a power-optimized ARM® Cortex™-A9 dual-core processor with NEON extensions (3.13mW / 10MHz) 1 , 2 to a DesignWare® ARC AS211SFX processor (0.27mW / 7MHz) results in a power reduction of about 3mW. More precisely, MP3 decoding on an ARM Cortex-A9 and NEON in a TSMC 40G process consumes 0.3125mW/MHz per core 3 , whereas the ARC AS211SFX consumes only 0.0735 mW/MHz 4 .

In most consumer devices like digital TVs, set-top boxes and Blu-ray Disc players, however, multiple audio decoding and encoding tasks have to run simultaneously. Additional tasks include audio enrichment (post-processing) software for virtual surround sound effects on tablets and other portable devices. This means that for use-cases with multiple streams of multi-channel, high-definition audio, the total processor load is not just 10 MHz for MP3 decoding, but may be up to 250 MHz or more, with the total overall power reduction savings easily reaching up to 250 * (0.3125-0.0735) mW/MHz = 60mW!

Design optimization
Figure 1: Design optimization by offloading audio tasks to an efficient audio processor

Impact of Memory Latency on Audio Processor Performance
Typically audio processors have hardware architectures that are tailored for audio processing, reducing not just the power but also the overall silicon cost. Synopsys’ 32-bit DesignWare ARC audio processors are further optimized for even better SoC performance. In today’s SoCs, the DDR system memory is a shared resource, for example for audio, video, graphics and program code. As a result, in order to ensure enough bandwidth remains available, memory latencies increase. While these latencies used to be in the order of 50 to 100 cycles, in many new designs we now see latencies of up to 200 to 300 cycles. Any processor will then require more ‘cycles’ (MHz) to execute a certain task, but ARC audio processors, with their XY memory architecture, experience less of an impact on their performance by memory latency compared to other processors in the industry. The ARC XY memory architecture enables concurrent fetching and processing of large audio data blocks. As the ARC audio processor knows exactly where the data resides in memory, it can process data more efficiently compared to cache-based designs, for example: an ‘X1Y1’ data block can already be in the process of being transferred from memory even while data block ‘X0Y0’ is still being processed. The benefits of this XY architecture are seen as early as when memory latencies go beyond 50 cycles.

A typical example is the Blu-ray Disc DTS audio use case that is often quoted. Table 1 shows the impact of 100 cycles memory latency on the performance of processors from leading vendors. Clearly, the impact of memory latency on ARC processors is small compared to the impact on other processors. While this example shows the impact on performance for 100 cycles memory latency, ARC audio processors will demonstrate a correspondingly greater benefit over other processors for larger system latencies (200 to 300 cycles) as well; either a lower clock frequency can be applied (resulting in a lower power consumption and smaller area) or more processing cycles are left over (leaving more ‘headroom’ for other tasks).

Memory latency tolerance
Table 1: Memory latency tolerance of Synopsys ARC audio processors

MPEG-4 AAC-LC and MPEG-4 HE-AAC Audio Standards
One of the many audio compression schemes used in consumer devices is Advanced Audio Coding (AAC), which is a ‘lossy’ compression and encoding scheme. Low-complexity AAC, or AAC-LC, is used for low bit-rate applications, such as internet streaming and was standardized as a profile of MPEG-2 Audio in 1997 (MPEG-2 AAC-LC). MPEG-4 AAC-LC, which was defined in 1999, also included Perceptual Noise Substitution (PNS). Spectral Band Replication (SBR), invented by Coding Technologies, was added to the MPEG-4 standard in 2003.This is now called HE-AAC v1 and is also known as aacPlus v1, eAAC+, AAC++, or enhanced AAC+. In 2004, a Parametric Stereo (PS) coding tool was added to the standard, which has since then been called MPEG-4 HE-AAC v2 (or aacPlus v2).

We typically find AAC-LC and aacPlus v2 (HE-AAC v2) used in applications such as digital radio, broadcast, internet streaming, high-quality audio recording, and in consumer devices such as digital TV, set-top boxes, digital video cameras, tablets, and media players. Therefore, audio processors need to provide best-in-class solutions for these standards.

Synopsys Provides the Most Optimized Implementation
Synopsys' DesignWare ARC MPEG-4 AAC-LC and aacPlus v2 encoders have multi-channel encoding capabilities, up to 7.1 (eight) audio channels, for surround sound applications.

Optional coding methods for improved efficiency provided in the ARC audio software codecs include Joint Stereo Encoding, Temporal Noise Substitution (TNS), PNS, Intensity Stereo (IS), SBR and PS. Support for Audio Data Interchange Format (ADIF), Audio Data Transport Stream (ADTS), and Low Overhead Audio Stream (LOAS) containers is also provided.

Synopsys audio software engineering teams design the implementation of the audio algorithms for the lowest processor load, expressed in the required amount of cycles (MHz). Synopsys also optimizes for the smallest possible memory footprint (Table 2). ROM is used to store the program code and RAM is used to store the audio data during program execution. Smaller ROM and RAM sizes result in a lower silicon area cost for the SoC integrator.

AAC-LC encoding
Table 2: Synopsys provides the most optimized AAC-LC encoding solution

SoundWave Audio Subsystem Integrated Software Stack
Synopsys’ DesignWare SoundWave Audio Subsystem provides SoC designers with a complete, pre-verified audio subsystem consisting of hardware, software, and prototyping for integration into SoC designs. The SoC-ready audio solution reduces SoC design and integration effort, while accelerating time to market.

In addition to the single or dual-core audio processor, the configurable SoundWave hardware also includes digital I2S and S/PDIF, plus optional analog audio interfaces. However, as most of the SoC design efforts are spent on software integration, the SoundWave Audio Subsystem also includes a Media Streaming Framework (MSF). The MSF enables developers to easily integrate and combine all audio software functions including source/sink, decoding/encoding, and post-processing elements into their application.

System integrators can easily embed all available audio functions into their application software using the SoundWave GStreamer plugin. This software plugin is an application-programming interface (API) that contains all the available features in the audio subsystem. The plugin takes care of all communication between the subsystem and the host processor, providing a plug-and-play integration of all audio functions into the application software running on the host.

Media Streaming Framework
Figure 2: Media Streaming Framework enables rapid integration into the application software

Summary
Software plays an increasingly important role in enabling SoC design teams to deliver high-quality audio solutions for consumer audio products. Audio processing IP providers need to deliver optimized solutions, both from a software and a hardware perspective.

Synopsys has invested in creating a differentiated audio processor IP solution for the consumer electronics SoC market. A broad portfolio of optimized audio software, including AAC-LC and aacPlus v2 codecs is available, allowing designers to reduce power consumption and silicon area for audio applications. By packaging software IP as part of a complete solution of hardware and software, we enable design teams to integrate advanced audio features, such as those for digital TVs, set-top boxes, tablets and digital video cameras, into their SoCs with lower risk and higher productivity.

Sources:
1 Y. Xu, “Employing ARM NEON in embedded system’s audio processing”, EE Times Asia, January 2010.
2 Wikipedia, “ARM architecture”, http://en.wikipedia.org/wiki/ARM_architecture
3 www.arm.com
4 Logic + memory, dynamic + leakage, nominal voltage TSMC 40G
5 www.tensilica.com
6 BDTi (http://www.bdti.com/InsideDSP/2011/01/26/Ceva)


More Information:



NewsArticlesBlogsWhite PapersWebinarsVideosNewslettersCustomer Successes