Go Back

Explore challenges and solutions in AI chip development

Download eBook

Innovate Faster with Synopsys Multi-Die Solution

Accelerating success from early architecture to manufacturing.

Download eBook

Explore Silicon Design, Verification & Manufacturing

Synopsys is a leading provider of electronic design automation solutions and services.

Simpleware Software

Virtual Prototyping

Synopsys Cloud

Unlimited access to EDA software licenses on-demand

Request a Free Trial

Explore Silicon IP

Synopsys is a leading provider of high-quality, silicon-proven semiconductor IP solutions for SoC designs.

Synopsys IP Portfolio

Download Brochure

Synopsys IP Technical Bulletin

Read Latest Issue

Explore Systems Verification and Validation

Synopsys is a leading provider of hardware-assisted verification and virtualization solutions.

System Test Generation

Company Overview

Success Stories

Explore our success stories.

Learn More

Synopsys Blog

Insights that shape the future.

Visit Our Blog

Hardware Overlay Management for Data Intensive, Ultra-Low Power Edge Devices

Rich Collins

Apr 19, 2021 / 5 min read

Table of Contents

Table of Contents
Introduction
ARC EM Hardware Overlay Manager (OLM) Option
Summary

Introduction

Memory management techniques have been critical to processor architectures for many years. The processor’s physical address space defines the range of addresses to memory (RAM) that physically exists within the system. Memory management dynamically allocates portions of the physical memory to a process and frees it for reuse by other processes when not needed.

Virtual addressing separates memory addresses required by these processes from the physical memory addresses, allowing the virtual address space to be larger than the physical memory space. A memory management unit (MMU) effectively “pages” or “swaps” the memory space required by a specific process to secondary storage by mapping virtual page numbers to physical page numbers in main memory (Figure 1).

Diagram of Virtual and Physical Address Space in Hardware Overlay Management

Figure 1: Virtual to physical address translation

Most low-power embedded (and deeply embedded) applications do not need to leverage a rich operating system such as Linux. These applications typically run on “bare-metal” (no operating system) or under a real-time operating system (RTOS). These options do not require the virtual to physical translation provided by an architected MMU. Synopsys’ DesignWare® ARC® EM Processor IP is typically used in deeply embedded applications running an RTOS.

However, there are cases where virtual to physical address translation can help increase performance, such as for a large code base residing in slow secondary memory. Processes can then be paged into faster, smaller on-chip memory called page RAM (PRAM). In systems that run all code as a single process (one Process ID, or PID), using a large virtual address space with a one-to-one correspondence between the virtual address and a large selected area of secondary memory (such as flash memory or DRAM), address-translation can be used to detect when a section (or one or more pages) of code is resident in the PRAM and provide the physical address of the page in the PRAM.

Synopsys has recently added support for this concept, referred to as “hardware overlay management” as an add-on option for the ARC EM processor.

Synopsys ARC NPX6 NPU Family for AI/Neural Processing Datasheet

Explore Synopsys ARC® NPX Neural Processor IP for high-performance, power-efficient AI SoCs.

Download Datasheet

ARC EM Hardware Overlay Manager (OLM) Option

The ARC EM processor supports virtual memory addressing when the Overlay Manager (OLM) is present (Figure 2). If the OLM option is not present or if it is present but is disabled, all virtual addresses are mapped directly to physical addresses.

As shown in Figure 1 the Overlay Manager features a Translation Lookaside Buffer (TLB) for address translation and protection of 4KB, 8KB or 16KB memory pages, and fixed mappings of untranslated memory. The upper half of the untranslated memory section is uncached (for IO use) and the lower half of the untranslated memory section is cached (for the operating system kernel).

With the OLM option enabled, the ARC EM core defines a common address space for both instruction and data accesses. The memory translation and protection systems can be arranged to provide separate, non-overlapping protected regions of memory for instruction and data access within a common address space.

Diagram of ARC EM Processor Architecture for Edge Devices

Figure 2: OLM components within the ARC EM pipeline

The TLB architecture of the OLM option can be thought of as a two level cache for page descriptors: “micro-TLBs” for instruction and data (μI-TLB & μD-TLB) as level one, and the “Joint” (J-TLB) as level two.

The μI-TLB and μD-TLB are physically located alongside the instruction cache and data cache respectively, where they perform single-cycle virtual to physical address translation and permission checking. The μI-TLB and μD-TLB are hardware managed. On a μI-TLB (or μD-TLB) page miss, the hardware fetches the missing page mapping from the J-TLB.
The J-TLB consists of a RAM based 256 or 512 entry buffer and is software managed. On a joint TLB page miss, special kernel-mode TLB miss handlers fetch the missing page descriptor from memory and store it in the J-TLB through an auxiliary register interface.

The main page table contains the complete details of each page mapped for use by kernel or user tasks (Figure 3). The μTLBs, J-TLB, and miss handlers combine to provide cached access into the OS page table. It is up the OS (or micro-kernel) to keep page table entries loaded into the OLM in sync (coherent) with the main page table in memory.

Bringing a new page from the secondary storage may involve evicting an existing page from the PRAM (in case the PRAM is full). The eviction is performed using the Least Recently Used (LRU) algorithm.

Figure 3: OLM page table structure

To facilitate efficient operation, an external module is required (customer defined) to track page usages and provide an indication to the software for victim pages to replace when the physical memory is fully-allocated, and a new page is required to be loaded. The OLM module provides an LRU interface giving an external module access to the required internal signals necessary to track used pages.

Summary

Typically, low-power embedded (and deeply embedded) applications rely on RTOS’s and do not require the address translation provided by an MMU for rich operating systems such as Linux, but the rapid growth of applications running on edge devices has pushed desktop class system requirements to ultra-low power embedded processors.

Since power and area are always at a premium, often-times an architected MMU based system is too costly and many software-based solutions have been implemented (such as automated overlay management) to address programmability within the limited memory resources of these embedded devices.

To complement these software solutions, Synopsys has provided a lightweight hardware-based overlay management solution for the ARC EM Processor IP, enabling address translation and access permission validation with minimal power and area overhead. This option boosts the ability to run larger and more data intensive applications on an ARC EM core such as those increasingly prevalent within AIoT “always-on” and wireless baseband application spaces.