Cloud native EDA tools & pre-optimized hardware platforms
Manuel Mota, Sr. Product Marketing Manager, Staff, Synopsys
The semiconductor industry continues to evolve with new products that support more functionality in a single system-on-chip (SoC), at a similar cost and power budget. At the core of it all is the ingenuity of the engineers that design these new SoCs and the advanced process technologies in which these SoCs are implemented. Moore’s law accurately predicted and described the technology evolution ever since it was created in 1965. MOSFET transistor miniaturization evolved from 10-um minimum pitch to 5-nm and below today, making it economically and technologically feasible for larger and more powerful SoCs to integrate all the required functionality into the same die. However, such SoC designs are becoming more cost-intensive due to the higher mask fabrication cost in advanced FinFET processes, including 7-nm and below. In addition, these SoCs for hyperscale data centers, AI, and networking applications are too large in size which reduces the fabrication yield to very low levels, further impacting the feasibility of the SoC.
The industry is responding to these challenges by splitting SoCs into multiple dies and assembling them in the same Multi-Chip Module (MCM). Figure 1 shows several advantages of this approach:
Figure 1: Two converging trends for die-to-die connectivity in MCMs
For this approach to work, the die-to-die connectivity must have the following key characteristics:
The article, Choosing the Right IP for Die-to-Die Connectivity, in the previous DesignWare® Technical Bulletin explained the SerDes-based die-to-die PHY architecture. This article describes the parallel-based die-to-die PHY architecture.
Packaging technology to support MCM implementations is evolving in many ways:
The data rate that can be reliably sustained across the links depends on the materials in which the die-to-die traces are made (substrate, RDL, silicon) and their pitch. Silicon interposers can only maintain low data rates per lane of up to 6-8 gigabits per second (Gbps) per lane, making the use of high-speed SerDes die-to-die links unsuitable.
Parallel die-to-die PHY architecture addresses the challenges of die-to-die links routed over silicon interposers. They leverage high-density routing to implement a very high number of simple, low speed I/Os that can achieve high aggregate bandwidth required in an efficient way. Similar to high-bandwidth memory (HBM) interfaces, parallel die-to-die links aggregate up to 1000s of pins, each transmitting data at a few Gbps. For example, if each pin can reach a data rate of 4Gbps unidirectionally, then the PHY needs 500 transmit pins and 500 receive pins to achieve a total aggregate bandwidth of two terabits per second (2Tbps bidirectional).
For the parallel-based die-to-die PHY to be effective, it needs to implement the following key principles:
Simplicity and Scalability
Given the large number of signal pins required for a parallel link, each driver and receiver relies on a simplistic architecture to be very energy- and area-efficient. They implement clock forwarding techniques to reduce the complexity of the data recovery architecture on the receive (RX) side by swapping complex clock and data recovery (CDR) with phase aligners that are supported by delay locked loops (DLLs). On the transmit (TX) side, equalization and training can also be simplified, leveraging the short channels and the low data rate being transmitted.
Additional architectural simplification is achieved by grouping the TX and RX data pins into small groups, each sharing a common circuitry (for power and area efficiency) and including all the circuitry required for their operation. These groups are called Channels.
It is possible to scale the PHY to efficiently support links with different bandwidth (BW) simply by assembling the correct number of channels to achieve the required BW.
Energy efficiency in the range of less than 1pJ/bit can be achieved with these techniques.
Maximizing beachfront efficiency is achieved with single-ended signaling, which reduces the number of pins and traces on the substrate by half.
Single-ended signaling is inherently more susceptible to crosstalk than differential signaling, however, the signals' relatively low data rate and high voltage swing mitigate noise and crosstalk concerns. Nonetheless, the complete interconnect bus design, including the TX and RX drivers as well as receiver and interposer traces, should be thoroughly validated for crosstalk to ensure the connection is robust.
Parallel die-to-die interfaces have 1000s of fine-pitched traces, making them susceptible to silicon fabrication process impurities with potentially catastrophic impacts on the yield of the link and of the MCM.
To maximize yield, the parallel die-to-die PHY includes redundant lanes distributed per channel, lane testing capabilities, and circuitry to re-route signals from lanes that are identified as defective to the redundant lanes, as shown in Figure 2. This makes it possible to repair the link and maximize yield.
Figure 2: Redundant links maximize yield and allow re-routing of broken links
The die-to-die PHY should also include its own testability features. By self-testing at speed, the PHY enables production testing of the isolated dies and links without requiring external test equipment. Built-in Self-Test (BIST) functionality includes:
Figure 3: Loopback modes allow PHY, die, and cross-die testing
Design for Success
Performance of die-to-die links over silicon interposers is determined by the tight interaction between the PHY and the interposer itself. To ensure a successful design, the die-to-die link must be completely verified and use a comprehensive testbench that includes interposer channel and power distribution models as well as the PHY to capture the complete link with all the interactions.
The increasing amount of functionality and size is forcing designers to split SoCs for hyperscale data center, AI, and networking applications into smaller dies, creating the need for reliable die-to-die PHY IP solutions. However, depending on the packaging technology, designers have multiple PHY options to choose from, each with their own characteristics and advantages. Synopsys offers a portfolio of die-to-die PHY IP including High-Bandwidth Interconnect (HBI+) and SerDes-based USR/XSR. The HBI PHY implements a parallel architecture and targets applications leveraging silicon interposer-based MCM packaging technology. The HBI PHY is also compatible with the ABI standard. Synopsys’ SerDes-based PHY supports USR/XSR die-to-die connectivity at 112G per lane for packaging technologies using organic substrate of InFO (Integrated Fan-Out).