Go Back

Explore challenges and solutions in AI chip development

Download eBook

Innovate Faster with Synopsys Multi-Die Solution

Accelerating success from early architecture to manufacturing.

Download eBook

Explore Silicon Design, Verification & Manufacturing

Synopsys is a leading provider of electronic design automation solutions and services.

Simpleware Software

Virtual Prototyping

Synopsys Cloud

Unlimited access to EDA software licenses on-demand

Request a Free Trial

Explore Silicon IP

Synopsys is a leading provider of high-quality, silicon-proven semiconductor IP solutions for SoC designs.

Synopsys IP Portfolio

Download Brochure

Synopsys IP Technical Bulletin

Read Latest Issue

Explore Systems Verification and Validation

Synopsys is a leading provider of hardware-assisted verification and virtualization solutions.

System Test Generation

Company Overview

Synopsys and Ansys are Now United

Learn More

Synopsys Blog

Insights that shape the future.

Visit Our Blog

Visualizing OpenVX to Optimize Vision Applications

Johan Kraft

Jan 19, 2019 / 9 min read

Table of Contents

Table of Contents
OpenVX for Embedded Vision
Tracealyzer for OpenVX
Trace View
CPU Load Graph
Actor Instance Graph
Actor Statistics Report
User Events
DesignWare EV6x Embedded Vision Processors and Development Tools

Today’s powerful vision processors allow for excellent performance, but making sure your solution makes efficient use of the hardware is another matter. The ability to visualize the runtime behavior of your system can help accelerate development, debugging, and validation. Percepio Tracealyzer for OpenVX™ allows you to visualize the execution of OpenVX applications and identify bottlenecks where optimization can make a big difference. Tracealyzer for OpenVX is available for Synopsys’ DesignWare® ARC® EV6x embedded vision processors, and leverages the built-in trace support in the ARC MetaWare EV Development Toolkit.

OpenVX for Embedded Vision

Embedded vision applications are typically written as OpenVX graphs. OpenVX is an open standard for the acceleration of computer vision applications and has many embedded and real-time use cases. This includes applications such as face, body, and gesture tracking, smart video surveillance, advanced driver assistance systems (ADAS), object and scene reconstruction, augmented reality, visual inspection, and robotics.

An OpenVX graph is constructed from one or more kernels. Each kernel performs a vision function and may be one of the standard OpenVX kernels, a custom supplied kernel, or a user-defined kernel (Figure 1).

OpenVX Graph Illustration for Embedded Vision Profiling

Figure 1: An OpenVX Graph

For example, with a vision processor like the ARC EV6x processor, a kernel may run on one or more of the vision CPUs or on the CNN Engine (Figure 2). The processor’s OpenVX-based runtime software manages the execution of the kernels and handles memory allocation and use.

OpenVX Runtime Software Stack Diagram for Embedded Vision

Figure 2: OpenVX-based runtime on EV6x processor

Tracealyzer for OpenVX

It can be a challenge to make sure the processor hardware is being used as efficiently as possible. For example, an OpenVX graph node may require more processing time than expected and overload one core, while the other cores remain mostly idle. Or, perhaps the application is spending a lot of time waiting for DMA transfers to complete. You may also have tried to improve performance by adding more compute resources, but the performance gain is less than expected. To address these types of issues, Percepio developed a version of their visualization solution, Tracealyzer, for OpenVX.

With this solution, you can identify bottlenecks where optimization can make a big difference. Tracealyzer for OpenVX provides a variety of graphical views showing different perspectives of the recorded behavior, ranging from a detailed trace view to high-level overviews and statistics. This article describes the different views available when using the Tracealyzer tool.

Synopsys ARC NPX6 NPU Family for AI/Neural Processing Datasheet

Explore Synopsys ARC® NPX Neural Processor IP for high-performance, power-efficient AI SoCs.

Download Datasheet

Trace View

The trace view displays a timeline of the OpenVX graph execution so that you can study the scheduling, pipelining, and timing in detail. The trace view can be adapted in many ways and supports both horizontal and vertical display.

As an example, we are using the demo trace provided with Tracealyzer for OpenVX (“demo_openvx.xml”). This has been recorded from an OpenVX demo application together with a screenshot from the trace view (Figure 3).

OpenVX Embedded Vision Profiling Trace View Diagram

Figure 3: OpenVX Demo Application

You can see how the runtime software schedules the graph using two cores. Core 0 reads the input frames and feeds it to the sobel3x3 node. The result is then processed further on Core 1, using the magnitude and convert_depth filters. The processing is run-to-completion, so each rectangle (fragment) in the trace is a separate job that runs without preemptions. The short fragments of the filter functions are precondition checks, while the long fragments show the actual filter processing.

The magnitude node starts before the sobel3x3 node is completed, which is possible because the OpenVX implementation in this example divides each frame into tiles. One node may output multiple tiles that are written to the output buffer one by one, as soon as completed. Thus, the following node (e.g., magnitude) does not need to wait for a full frame, but can start as soon as the first tile is available, assuming the nodes run on different cores. This allows for a pipelined processing that utilizes the cores efficiently.

OpenVX Embedded Vision Profiling Visualization Chart

Figure 4: Trace View

The trace view is composed of fields, labeled “CPU 0” and “CPU 1” in Figure 4. Each type of fields displays different types of information. OpenVX nodes are shown in a “scheduling” field, either one field per CPU core (left example) or a single field for all nodes (right example).

CPU Load Graph

To get an overview of how your OpenVX application utilizes your CPU cores, look at the CPU load graph, shown twice (one for each core) together with the trace view in vertical mode (Figure 5). The CPU load graph allows you to see the overall load on your CPU cores, as well as how the load varies over time and the contribution of each node.

The CPU load graph also works as an overview where you can spot anomalies, for instance, the two spikes in the “sobel3x3” node (shown in red) where the utilization is around 80-90%. You can see what causes these spikes by double-clicking in the CPU load graph to show the corresponding section in the trace view.

OpenVX Visualization Profiling in Embedded Vision Design

Figure 5: CPU Load Graph

All views in Tracealyzer are interconnected in similar ways, which makes it easy to drill down from high-level overviews into the detailed trace. The colors make it easier to identify the OpenVX graph nodes. The same color coding is used across all Tracealyzer views. It is also possible to open multiple instances of the CPU load graph or view all cores combined in a single CPU load graph. Note that the CPU loads are accumulated in this mode, so with two cores the scale goes up to 200%.

The CPU load graph works by dividing the displayed time window into a number of fixed size time intervals, by default 50, and then calculates the amount of processing time used by each node within each time interval. The result is displayed as a stacked histogram, where the Y-axis shows the relative utilization within each time interval. Since the concept of CPU load is always relative to a certain time window (independent of what tool you use), zooming in or out may change the levels of the CPU load graph as the reference time window is changed. When zoomed in a lot, most time intervals will only contain a single node so the graph will be similar to the trace view.

Actor Instance Graph

In Figure 6, we added two instances of the Actor Instance Graph, where the Y-axis shows the execution times of the graph nodes. This way, you can see where nodes execute longer than normal and inspect the trace view to see the details. Note that “Actor” is a Tracealyzer term meaning “execution context”, corresponding to nodes in OpenVX.

OpenVX Visualization Profiling for Embedded Vision Example

Figure 6: Actor Instance Graph

In addition to execution time, the Actor Instance Graph can show various timing properties, including separation and periodicity. You can change the property that is displayed in the “Execution Time” dropdown menu.

Actor Statistics Report

The Actor Statistics Report gives a statistical summary of the trace, including the highest, lowest, and average values observed for timing properties such as execution time. All extreme values in this report are links and can be clicked on to find the corresponding location in the trace view (Figure 7).

Figure 7: Actor Statistics Report

With the Actor Statistics Report, you can find the extreme values and see what was going on in the system at that time. Although all details are not recorded you can still get valuable clues about what caused these values. For instance, using the Actor Instance Graph you can find other cases with similarly high execution time and check for correlations in the trace. Perhaps the high execution time only occurs under particular circumstances, e.g., due to intense activity on other CPU cores saturating the bus.

Note that you can export and save the statistics reports, either as formatted HTML files (like above) or as tabbed text files (Figure 8). The latter allows for easy data import into other tools and is done by checking the option “Data Export” in the Actor Statistics Report dialog. This allows you to run measurements on alternative designs and compare the resulting performance metrics systematically. For instance, the statistics report shows that IDLE0 is running 40.7% of the time, meaning that Core 0 is only 59.7% loaded, while IDLE1 only runs for 17% of the time, meaning that Core 1 is 83% loaded.

OpenVX Performance Metrics Table in Embedded Vision Article

Figure 8: Actor Statistic report in tabbed text file format

User Events

Tracealyzer allows you to add your own user events, i.e., custom events logged from the application code. User events allows you to visualize just about anything in your application, like diagnostic messages, variable values, and states.

Figure 9 shows an example where user events have been logged on two user event channels, “MyVariable” showing values of an integer variable and “MyState” showing state names. Tracealyzer can display such user events in several ways, e.g., as event labels in the trace view (1) and as entries in the Event Log (2). The User Event Signal Plot (3) allows for plotting numerical data from user events.

Moreover, if you have important state variables in your system, you can log the state changes as user events and define a State Machine in Tracealyzer to see the states in the trace view timeline (4). You can also see a summary of the state changes as a state machine graph (5). You can even get statistics on the time spent in each state, or the time between any two events by defining a “custom interval”.

OpenVX Visualization Profiling Interface Screenshot

Figure 9: User Events

DesignWare EV6x Embedded Vision Processors and Development Tools

The example use case described is based on an OpenVX application developed using Synopsys’ DesignWare ARC MetaWare EV Development Toolkit for ARC EV6x processors.

The EV6x Embedded Vision Processors integrate one, two or four high-performance vision CPUs, each consisting of a 32-bit scalar core with a 512-bit vector DSP. They can include an optimized convolution neural network (CNN) engine for fast and accurate object detection, classification, and scene segmentation. The processors are fully programmable and configurable and combine the flexibility of software solutions with the high performance and low power consumption of dedicated hardware.

The ARC MetaWare EV Development Toolkit provides a complete set of tools, runtime software and libraries that enable the development of embedded vision applications and machine learning applications with the EV6x Processor family. The toolkit consists of the MetaWare Compiler and Debugger, ARC nSIM Instruction Set Simulator (ISS), EV Runtime and libraries, CNN Software Development Kit (CNN SDK), and the EV Virtualizer Development Kit (EV VDK).

Efficient development of advanced embedded vision and AI applications requires the ability to rapidly debug, validate, and optimize software. Percepio’s Tracealyzer for OpenVX visualization tool enables designers using Synopsys’ ARC EV6x processors to observe the runtime behavior of their software and optimize their applications for maximum performance while accelerating development cycles for real-time vision applications such as ADAS and self-driving vehicles.

Watch the video to learn more about Percepio Tracealyzer for OpenVX.

Learn more about Percepio Tracealyzer.