Synopsys Insight Newsletter 

Insight Home   |  Next Article

Samsung

Issue 2, 2012



Partner Highlight
Get a Head Start: Early Software Bring-up for ARM big.LITTLE Processing

Robert Kaye, ARM, and Tom De Schutter, Synopsys, explain how virtual prototyping enables design teams to get the best performance and energy efficiency from ARM’s innovative asymmetric multicore architecture.

Every engineer knows the importance of having the right tools for the job. ARM® big.LITTLE™ processing gives designers exactly that. By taking advantage of a high-performance processor for compute-intensive tasks and a highly energy-efficient processor for less demanding jobs, design teams can extend battery life by up to 70% for applications with highly variable workloads, such as smartphones.

However, the processor architecture is only one part of the toolkit. Software teams must also be able to develop, optimize and integrate code to get the best out of the big.LITTLE multicore architecture.

By using Virtualizer™ Development Kits (VDKs), design teams can start developing software up to 12 months before the hardware is available. VDKs allow software developers to simulate complex software stacks, such as Linux, Android and multicore task migration software, using real-world user scenarios, ensuring that they achieve the right balance between energy efficiency and top-end performance.

Delivering High Performance and Extreme Energy-Efficiency

Delivering High Performance and Extreme Energy-Efficiency
Attempting to design a processor that addresses both very high performance and extreme energy-efficiency can result in an architecture that doesn’t quite achieve either. To address that challenge, the big.LITTLE processing concept combines a “big” high-performance processor (ARM® Cortex™-A15 MPCore™ processor) and a “little” energy-efficient processor (ARM Cortex-A7 MPCore processor) in an asymmetric, heterogeneous, multicore system.

The two processors share the same instruction set architecture (ISA), and are coupled by interconnect that supports full cache coherency. A shared controller directs interrupts to the active processor (Figure 1).

The architecture enables software developers to automate task migration between each processor cluster and, when appropriate, allocate a single execution environment across both clusters (multiprocessing).

To get the best out of both the high-performance and low-power capabilities of big.LITTLE processing, developers must carefully architect the task migration software layer and the Linux scheduler along with the processing subsystem.

A programmer’s view of big.LITTLE
Figure 1: A programmer’s view of big.LITTLE

Developing Software for Multicore
The key challenge for software teams when creating any new SoC is to develop code before their target hardware exists. Even after the hardware becomes available, it can be difficult for developers to get the visibility they need into what their code is doing for effective and productive debug.

To get the best out of the big.LITTLE architecture, developers must decide how to exploit variances in application workload before they can even get their hands on real silicon. To do that, they need an environment that lets them see into the device so they can clearly observe how allocating tasks between the processors affects both performance and power.

The Virtual Prototype Advantage
Virtual prototypes are fast, fully functional software models of entire systems. They run the same code that the design team will port to the hardware when it becomes available. Because virtual prototypes don’t depend on the physical hardware, design teams can make them available to the software team 12 months or more in advance of the silicon being ready, enabling a time-to-market advantage and an opportunity to win market share over competitors.

Software developers enjoy other benefits as a result of using virtual prototypes. Unlike hardware, debugging a virtual prototype is non-intrusive, which means that the debug tools or process itself does not change the behavior of the design. This allows developers to run real-world user scenarios and get repeatable results.

Virtualizer Development Kits
VDKs are software development kits (SDKs) with a virtual prototype as a simulation target. A VDK for a specific design contains the virtual prototype for that design, the right set of multicore debug and analysis tools, and sample software.

The Synopsys VDK Family for ARM Cortex processors includes a VDK for ARM big.LITTLE processing (Figure 2). This VDK includes a complete virtual prototype representing a big.LITTLE processing Versatile Express board. This virtual prototype is built with Fast Models from ARM and DesignWare® models from Synopsys. These models enable design teams to rapidly create virtual prototypes for most common mobile and consumer application platforms.

VDK for ARM big.LITTLE processing
Figure 2: VDK for ARM big.LITTLE processing

The VDK also includes multicore debugging and analysis tools, which provide full control and visibility, and synchronize debug across all processors and other components in the platform. Design teams can get up and running quickly with the VDK by modifying the sample software stacks for Linux, Android and task migration that are available “out of the box”. The VDK is easy to configure and extend – design teams can add their own peripherals and change the configuration of the big.LITTLE architecture.

VDKs enable design teams to easily manage the complexities of developing software for multicore architectures and to bring up and optimize the multicore task migration software layer.

Early Model Availability
Design teams can only deploy virtual prototyping environments if they have access to software models of the processors and the other components that the system comprises. Because ARM develops its processor models as part of its processor development, the Fast Models are available at the same time the processor launches. ARM uses an identical validation suite for both Fast Models and RTL, which ensures the fidelity of the models to the hardware.

Synopsys provides a comprehensive range of DesignWare interface IP models, including USB 3.0 and GMAC, to complement the application subsystem design.

As the VDK must execute as near to real-time as possible to enable productive software development, the models focus on software throughput without compromising functional accuracy. The result is that a design team can bring-up an OS in seconds on a virtual prototype.

Migrating Tasks
The VDK for ARM big.LITTLE processing includes task migration software analysis tools to help design teams utilize big.LITTLE’s performance and energy efficiency capabilities.

For example, if a user is looking up a location on Google Maps, the software will allocate the task to the Cortex-A15 processor in order to process the request as quickly as possible. If the user then receives a phone call, the high-performance needs for the browser session go away, and the phone call’s performance needs are much lower. This new task will switch to run on the Cortex-A7 processor to achieve higher energy-efficiency.

To maximize the benefits of big.LITTLE processing, it is important to tune the switching strategy towards the specific use cases of the device in which it is deployed and the profile of the individual user.

Typically, the Linux Dynamic Voltage and Frequency Scaling (DVFS) function controls task migration by treating the Cortex-A15 and Cortex-A7 processors as two different power states. Linux provides multiple governors that can control the transition between these states. The governors include high performance, power saving, on-demand performance and user-space.

Having the kernel decide about the switching, however, has both advantages and disadvantages. An advantage is that it will work “out of the box” for any application. Depending on pre-defined CPU load threshold and workload sampling rate, Linux will initiate the switch between the two clusters. The disadvantage is that switches may happen even though the user does not benefit from them. This is where software developers can add user-space governors to fine tune the task migration.

The best strategy to trigger switching between the clusters is typically a mix of kernel and governors in the Android power manager. For example, if the phone is idle, the screen is locked and an RSS feed is updated, the power manager will make sure the processing is performed on the Cortex-A7 since the user is not waiting for the result. In the case of video playback, the power manager will ensure that the task doesn’t switch to the Cortex-A7 to avoid potential glitches in the audio or video.

The VDK’s hardware-software debug tool “Active CPU status” window shows at any given moment whether the software tasks are running on the Cortex-A15 or the Cortex-A7 processor (Figure 3).

Multi-cluster, multi-CPU system debug
Figure 3: Multi-cluster, multi-CPU system debug

The VDK‘s multicore trace analysis tool gives an in-depth view into the software execution on the Cortex-A15 and Cortex-A7 processors, enabling developers to debug integration defects.

In addition, the VDK works with third-party debuggers, such as the software debuggers from ARM and Lauterbach, to allow developers to view source code and zoom into bugs. The VDK supports the latest ARM Development Studio 5 (DS-5™) Debugger, a software development tool suite which simplifies the development of Linux and Android native applications for ARM processor-based systems.

This use case illustrates how combining software models, a debugger and the task migration software stack, the VDK enables software developers to make an early start on analyzing and optimizing tasks, and fine-tuning them between the big and little processor clusters to get the best possible performance and energy efficiency from the subsystem.

Replaying Scenarios
One of the challenges that design teams encounter when using real hardware for debug is that it can be difficult to reproduce bugs that arise from complex scenarios. Subtle changes in hardware timing can cause a situation that occurred on one run to be eliminated during a subsequent test.

The VDK gives users the ability to replay scenarios to exactly reproduce different modes of operation or to explore the effects on power and performance of various architectural options.

The replay capability also helps developers explore how different modes of operation interact — for example, when the user is playing a game and receives a phone call. These kinds of user scenarios are difficult to investigate without having a deterministic and repeatable environment, a significant benefit of virtual prototypes.

Because everything is scriptable in the VDK, it’s easy to record and replay sequences of events, which allows developers to debug events to the same point, and create scenarios for regression testing.

“As companies are adopting our big.LITTLE processing to enable both next generation performance and energy-efficiency for smart, connected devices, ARM and its partners have paved the way by encouraging a strong ecosystem. By offering the VDK Family for ARM Cortex processors, supporting the Cortex-A15 processor and big.LITTLE processing, Synopsys is able to offer this partner ecosystem a highly effective solution for early software development and help facilitate innovation.”
Jim Nicholas, vice president of marketing, processor division, ARM.

Summary
ARM big.LITTLE processing offers design teams unique capabilities to balance performance and energy-efficiency for demanding applications with highly variable computational loads.

The combination of ARM’s processors and Synopsys’ VDKs gives software developers the right control, visibility and speed to bring up and debug software quickly and begin developing software up to 12 months before hardware availability.



More Information:

About the Author
Robert Kaye is a technical specialist at ARM focusing on system modeling. Robert has been with ARM for five years and before joining the Fast Models team was responsible for a broad portfolio of IP products in the Fabric IP team. Robert has over 30 years of experience in the semiconductor industry. Before joining ARM he worked at Mentor Graphics on the development of hardware-software co-verification solutions and at Texas Instruments in EDA development and ASIC applications engineering.

Tom De Schutter is senior product marketing manager for System-Level Solutions at Synopsys. He joined Synopsys through the acquisition of CoWare where he was the product marketing manager for transaction-level models. Tom has over 10 years of experience in system-level design. Before joining the marketing team, he led the transaction-level modeling team at CoWare.


Having read this article, will you take a moment to let us know how informative the article was to you.
Exceptionally informative (I emailed the article to a friend)
Very informative
Informative
Somewhat informative
Not at all informative