Addressing the Changing DSP and Controller Needs of 5G New Radio Modems

Graham Wilson, Sr. Product Marketing Manager, Synopsys

The 5G standard was defined in the 3GPP release of version 15 of the ITU Communications standard. The 5G standard greatly broadens the scope of cellular wireless communications to not only cover high data rate mobile devices, but also billions of low data rate connected IoT edge devices and latency-critical edge nodes such as autonomous vehicles, drones, and industrial automation. Cellular modems are very complex to architect and develop and need to go through a full certification before release. Hence modem developers will want to re-use existing silicon for the different configurations and scaling factors of the 5G modem usage. Figure 1 shows the range of these configurations and scaling available with the 5G specification.

Figure 1: 5G modem configurations and scaling factors

To achieve very high data rates with lower latency, 5G modem architects look to optimize computation across multiple-task specific programmable cores. These include different ISA and architecture cores from Very Long Instruction Word (VLIW) / wide Single Instruction Multiple Data (SIMD) DSP cores and algorithm optimized processors. These processors are connected in a heterogeneous multicore subsystem and can be enabled for pre-assigned algorithm computation. The use of dedicated cores for pre-defined algorithms helps reduce delay in the beginning of the operation execution, in turn reducing latency. Another technique to achieve higher performance and lower latency is by offloading to hardware block accelerators. Coupling these hardware blocks to the cores or a controller hub system allows lowest latency and optimal data movement to and from these blocks.

With the drive towards many heterogeneous cores and hardware blocks all running in parallel, there is a significant increase in the requirements and complexity of the 5G modem control system, including:

  • Efficient control processing with
    • Fast context switching
    • Fast interrupt response and support for large interrupt source
  • DSP capability to
    • Efficiently run short vector looping DSP code, that should not load the main VLIW/wide SIMD DSP core
    • Avoid overhead of switching to different a core to execute DSP code
  • Flexibility in memory architecture to
    • Support fast local memories
    • Provide multi-layer caching for large code bases
  • Predictability
    • Critical functions for event handling have hard deadlines (TTI of 1ms and below)
    • Closely Coupled Memories (CCMs) for predictable memory access times
    • Cache control (e.g., prefetching, locking) for predictable worst-case cache behavior
  • Low power consumption with effective sleep modes
  • Low-latency operation
    • With efficient interfacing to hardware blocks
    • For programming hardware accelerators and to efficiently handle data transfers (e.g., via DMA)
  • Multicore support

On top of the synchronization of modem systems and guaranteeing operation between multicores and hardware blocks, the system controller should be able to handle the non-sequential code execution as part of initial synchronization and handshake used by user equipment devices. In addition, the Layer 1 interface schedule needs to run to manage the LTE timelines.

This type of processing requires a different type of processor, that combines both DSP and controller capabilities. Traditional combined solutions offer the ability to efficiently run control and DSP code together; however, they do not meet the requirements of tightly coupled hardware block connection schemes, efficient context switching, and multicore support. Synopsys’ DSP-enabled DesignWare® ARC® HS47D processor is a core that meets all the requirements for 5G modem control systems. As a result, it has been used as the system controller by many Tier1 5G modem developers worldwide.

 

Leader in DSP and Control Performance

The ARC HS47D core is based on an advanced 10-stage pipeline with dual-issue superscalar instruction execution. This pipeline and superscalar architecture implement late arithmetic logic unit (ALU) execution. Depending upon the context of the core resources and conditional instructions, the late ALU enables more cycles for conditions to resolve before allowing instruction commit. There is also early resolution of mispredicted branches, which greatly reduces pipeline stalls and improves control operation performance.

The DSP extensions include more than 150 DSP instructions for fixed-point, complex data type as well as floating point (single precision and double precision) computation native to the core. The cores can perform sustained dual MAC (16-bit x 16-bit) with quad MAC (16-bit x 16-bit) operations for key digital filter functions. The parallel instruction execution of the superscalar architecture, combined with the advanced load/store unit of the core, enables high-performance sustained DSP computation to be achieved comparable to DSP-only cores.

Because of all these architectural and ISA functions, the HS47D processor can achieve a typical clock frequency of 2.5GHz (16nFF) and give more than enough performance overhead for extra computation requirement growth. The core also delivers an industry-leading 5.2 CoreMarks/MHz benchmark number as well as 3.0 Dhrystone MIPS/MHz.

Low Latency

The HS47D core offers one of the highest numbers of connection scheme options within an SoC. On top of the modular bus interface (e.g., AMBA, etc.), there is a separate peripheral bus to the HS4xD core. There is a zero-latency peripheral bus, which is accessible only by the processor, with a dedicated region of the memory map. SoC developers can connect their own AMBA peripherals to this peripheral bus, isolating performance-critical peripherals from the main AMBA bus with its latency and delays.

The HS47D cores support ARC Processor EXtension (APEX) technology, which allows instruction set extensions, register banks, custom registers, and custom interfaces. This enables developers to add custom instructions and registers to accelerate key algorithms, delivering very high performance where needed. With the use of APEX to add custom instructions, developers can greatly reduce the number of cycles needed for key algorithm computation, thus greatly reducing computation and latency.

Figure 2: ARC HS processor configuration options

Using APEX register extensions, hardware blocks can be connected to APEX registers, which can be directly accessed. These registers can be of any width or definition in order to fit with the hardware block operation. This allows user hardware blocks to be directly connected to the core and controlled with core instructions. A cluster DMA engine is available with the HS47D core, allowing offloading of data movement between peripherals and memory. The cluster DMA is under the control of the HS47D core.

Another critical part of implementing a low-latency, high-performance system control is to use an efficient multicore connection scheme. Typically, two, four or more ARC HS47D processors have been used in 5G modem control systems. Using the ARConnect technology with multicore coherency and data movement blocks, system developers can quickly build a multicore system. Figure 3 shows a simplified diagram of the connection blocks using an ARC multicore HS47D processor.

Figure 3: Multicore system using ARConnect

Multi-Standard Functionality

With a dedicated system for implementing a the high-performance, low-latency 5G modem, there may be less flexibility implement the required backward compatible cellular standards, especially the 2G and 3G standards. Sometimes the much lower processing requirements do not fit well onto the available task-optimized heterogeneous cores; moreover, the VLIW/SIMD DSP cores are too wide in vector stride length to map the algorithms to.

As many of the Tier1’s using the HS47D in their 5G modem have found, the HS47D multicore system offers the right level of DSP and control performance to run 2G and 3G, as well as satellite communication standards (GNSS), allowing the main computation engines of the 5G system to be in idle low-power state.

Also, partitioning the 5G computation algorithm across the heterogeneous system allows the 5G computation to run on the HS47D multicore system. For example, the Hybrid Automatic Repeat request (HARQ) control can be run, performing request retransmission if a transmission error cannot be corrected. This is run in synchronization with HARQ hardware unit and forward error correction (FEC). As well as this, 5G modem developers often run Discontinuous Reception (DRX) control, a power saving technique with periodic repetition of sleep and wake-up modes, on the HS47D. 

Summary

5G modem systems require very high data rate and low latency. Meeting these needs have pushed the modem system requirements to more heterogeneous and higher performing control systems. The HS47D has proved ideal for the system control as it offers very high-performance control operation with DSP functionality, giving an industry-leading 5.2 CoreMarks/MHz benchmark performance.

Along with this performance, the HS47D has great flexibility in connection schemes of hardware accelerator blocks, either directly to the core or via dedicated bus interfaces, which delivers lower latency. The HS47D processor can be configured as a multicore system with supporting multicore coherency blocks to allow system developers to scale the level of computation to meet the needs of the most demanding 5G system control applications.

 

Web: DesignWare ARC HS4x web page

Datasheet: DesignWare ARC HS4x datasheet