Cloud native EDA tools & pre-optimized hardware platforms
Graham Wilson, Sr. Product Marketing Manager, Synopsys
To achieve very high data rates with lower latency, 5G modem architects look to optimize computation across multiple-task specific programmable cores. These include different ISA and architecture cores from Very Long Instruction Word (VLIW) / wide Single Instruction Multiple Data (SIMD) DSP cores and algorithm optimized processors. These processors are connected in a heterogeneous multicore subsystem and can be enabled for pre-assigned algorithm computation. The use of dedicated cores for pre-defined algorithms helps reduce delay in the beginning of the operation execution, in turn reducing latency. Another technique to achieve higher performance and lower latency is by offloading to hardware block accelerators. Coupling these hardware blocks to the cores or a controller hub system allows lowest latency and optimal data movement to and from these blocks.
With the drive towards many heterogeneous cores and hardware blocks all running in parallel, there is a significant increase in the requirements and complexity of the 5G modem control system, including:
On top of the synchronization of modem systems and guaranteeing operation between multicores and hardware blocks, the system controller should be able to handle the non-sequential code execution as part of initial synchronization and handshake used by user equipment devices. In addition, the Layer 1 interface schedule needs to run to manage the LTE timelines.
This type of processing requires a different type of processor, that combines both DSP and controller capabilities. Traditional combined solutions offer the ability to efficiently run control and DSP code together; however, they do not meet the requirements of tightly coupled hardware block connection schemes, efficient context switching, and multicore support. Synopsys’ DSP-enabled DesignWare® ARC® HS47D processor is a core that meets all the requirements for 5G modem control systems. As a result, it has been used as the system controller by many Tier1 5G modem developers worldwide.
The ARC HS47D core is based on an advanced 10-stage pipeline with dual-issue superscalar instruction execution. This pipeline and superscalar architecture implement late arithmetic logic unit (ALU) execution. Depending upon the context of the core resources and conditional instructions, the late ALU enables more cycles for conditions to resolve before allowing instruction commit. There is also early resolution of mispredicted branches, which greatly reduces pipeline stalls and improves control operation performance.
The DSP extensions include more than 150 DSP instructions for fixed-point, complex data type as well as floating point (single precision and double precision) computation native to the core. The cores can perform sustained dual MAC (16-bit x 16-bit) with quad MAC (16-bit x 16-bit) operations for key digital filter functions. The parallel instruction execution of the superscalar architecture, combined with the advanced load/store unit of the core, enables high-performance sustained DSP computation to be achieved comparable to DSP-only cores.
Because of all these architectural and ISA functions, the HS47D processor can achieve a typical clock frequency of 2.5GHz (16nFF) and give more than enough performance overhead for extra computation requirement growth. The core also delivers an industry-leading 5.2 CoreMarks/MHz benchmark number as well as 3.0 Dhrystone MIPS/MHz.
Using APEX register extensions, hardware blocks can be connected to APEX registers, which can be directly accessed. These registers can be of any width or definition in order to fit with the hardware block operation. This allows user hardware blocks to be directly connected to the core and controlled with core instructions. A cluster DMA engine is available with the HS47D core, allowing offloading of data movement between peripherals and memory. The cluster DMA is under the control of the HS47D core.
Another critical part of implementing a low-latency, high-performance system control is to use an efficient multicore connection scheme. Typically, two, four or more ARC HS47D processors have been used in 5G modem control systems. Using the ARConnect technology with multicore coherency and data movement blocks, system developers can quickly build a multicore system. Figure 3 shows a simplified diagram of the connection blocks using an ARC multicore HS47D processor.
With a dedicated system for implementing a the high-performance, low-latency 5G modem, there may be less flexibility implement the required backward compatible cellular standards, especially the 2G and 3G standards. Sometimes the much lower processing requirements do not fit well onto the available task-optimized heterogeneous cores; moreover, the VLIW/SIMD DSP cores are too wide in vector stride length to map the algorithms to.
As many of the Tier1’s using the HS47D in their 5G modem have found, the HS47D multicore system offers the right level of DSP and control performance to run 2G and 3G, as well as satellite communication standards (GNSS), allowing the main computation engines of the 5G system to be in idle low-power state.
Also, partitioning the 5G computation algorithm across the heterogeneous system allows the 5G computation to run on the HS47D multicore system. For example, the Hybrid Automatic Repeat request (HARQ) control can be run, performing request retransmission if a transmission error cannot be corrected. This is run in synchronization with HARQ hardware unit and forward error correction (FEC). As well as this, 5G modem developers often run Discontinuous Reception (DRX) control, a power saving technique with periodic repetition of sleep and wake-up modes, on the HS47D.