Parallel-Based PHY IP for Die-to-Die Connectivity

Manuel Mota, Sr. Product Marketing Manager, Staff, Synopsys

Introduction

The semiconductor industry continues to evolve with new products that support more functionality in a single system-on-chip (SoC), at a similar cost and power budget. At the core of it all is the ingenuity of the engineers that design these new SoCs and the advanced process technologies in which these SoCs are implemented. Moore’s law accurately predicted and described the technology evolution ever since it was created in 1965. MOSFET transistor miniaturization evolved from 10-um minimum pitch to 5-nm and below today, making it economically and technologically feasible for larger and more powerful SoCs to integrate all the required functionality into the same die. However, such SoC designs are becoming more cost-intensive due to the higher mask fabrication cost in advanced FinFET processes, including 7-nm and below. In addition, these SoCs for hyperscale data centers, AI, and networking applications are too large in size which reduces the fabrication yield to very low levels, further impacting the feasibility of the SoC.

The industry is responding to these challenges by splitting SoCs into multiple dies and assembling them in the same Multi-Chip Module (MCM). Figure 1 shows several advantages of this approach:

  • Splitting the die into multiple homogeneous dies (same functionality in each die) reduces the size of individual dies, improving the fabrication yield and providing greater product flexibility.
  • Integrating heterogeneous dies enables the use of process technologies that are cost-optimized for the implemented function. For example, analog and RF functions do not take advantage of process scaling and are more efficiently implemented in older nodes.

Figure 1: Two converging trends for die-to-die connectivity in MCMs

For this approach to work, the die-to-die connectivity must have the following key characteristics:

  • Very high-energy efficiency per bit transmitted
  • Very low latency to mitigate the performance impact of splitting the functionality between the dies
  • Link reliability (Bit Error Rate)
  • Bandwidth efficiency or the amount of die beachfront allocated to transmitting a given data rate

The article, Choosing the Right IP for Die-to-Die Connectivity, in the previous DesignWare® Technical Bulletin explained the SerDes-based die-to-die PHY architecture. This article describes the parallel-based die-to-die PHY architecture.

Packaging Technology

Packaging technology to support MCM implementations is evolving in many ways:

  • Organic substrates enable low-cost, low-density routing between dies, with limited number of I/Os per die
  • Silicon interposers and silicon bridges, due to their fine pitch, allow for very high-density routing between dies and a very high number of I/Os per die, however, they are more complex, and their cost can be significantly higher
  • Fan-out packaging, either based on redistribution layers (RDLs) or wafer-on-wafer (WoW) technologies, promises optimal tradeoffs between low cost and complexity and high-density routing between dies with high number of I/Os per die

The data rate that can be reliably sustained across the links depends on the materials in which the die-to-die traces are made (substrate, RDL, silicon) and their pitch. Silicon interposers can only maintain low data rates per lane of up to 6-8 gigabits per second (Gbps) per lane, making the use of high-speed SerDes die-to-die links unsuitable.

Parallel Die-to-Die PHY Architecture

Parallel die-to-die PHY architecture addresses the challenges of die-to-die links routed over silicon interposers. They leverage high-density routing to implement a very high number of simple, low speed I/Os that can achieve high aggregate bandwidth required in an efficient way. Similar to high-bandwidth memory (HBM) interfaces, parallel die-to-die links aggregate up to 1000s of pins, each transmitting data at a few Gbps. For example, if each pin can reach a data rate of 4Gbps unidirectionally, then the PHY needs 500 transmit pins and 500 receive pins to achieve a total aggregate bandwidth of two terabits per second (2Tbps bidirectional).

For the parallel-based die-to-die PHY to be effective, it needs to implement the following key principles:

Simplicity and Scalability

Given the large number of signal pins required for a parallel link, each driver and receiver relies on a simplistic architecture to be very energy- and area-efficient. They implement clock forwarding techniques to reduce the complexity of the data recovery architecture on the receive (RX) side by swapping complex clock and data recovery (CDR) with phase aligners that are supported by delay locked loops (DLLs). On the transmit (TX) side, equalization and training can also be simplified, leveraging the short channels and the low data rate being transmitted.

Additional architectural simplification is achieved by grouping the TX and RX data pins into small groups, each sharing a common circuitry (for power and area efficiency) and including all the circuitry required for their operation. These groups are called Channels.

It is possible to scale the PHY to efficiently support links with different bandwidth (BW) simply by assembling the correct number of channels to achieve the required BW.

Energy efficiency in the range of less than 1pJ/bit can be achieved with these techniques.

Beachfront Efficiency

Maximizing beachfront efficiency is achieved with single-ended signaling, which reduces the number of pins and traces on the substrate by half.

Single-ended signaling is inherently more susceptible to crosstalk than differential signaling, however, the signals' relatively low data rate and high voltage swing mitigate noise and crosstalk concerns. Nonetheless, the complete interconnect bus design, including the TX and RX drivers as well as receiver and interposer traces, should be thoroughly validated for crosstalk to ensure the connection is robust.

Robustness

Parallel die-to-die interfaces have 1000s of fine-pitched traces, making them susceptible to silicon fabrication process impurities with potentially catastrophic impacts on the yield of the link and of the MCM.

To maximize yield, the parallel die-to-die PHY includes redundant lanes distributed per channel, lane testing capabilities, and circuitry to re-route signals from lanes that are identified as defective to the redundant lanes, as shown in Figure 2. This makes it possible to repair the link and maximize yield.

Figure 2: Redundant links maximize yield and allow re-routing of broken links

 

Testability

The die-to-die PHY should also include its own testability features. By self-testing at speed, the PHY enables production testing of the isolated dies and links without requiring external test equipment. Built-in Self-Test (BIST) functionality includes:

  • Loopback modes for die and cross-die testing (Figure 3)
  • BIST with pattern generation and matching
  • Eye diagram capabilities
  • DLL BIST, boundary scan, MUX-scan, automatic test pattern generation (ATPG), and On-chip clock (OCC)
  • Standard interfaces for SoC test fabric (e.g.: IEEE1500, IEEE1838, JTAG)

Figure 3: Loopback modes allow PHY, die, and cross-die testing

 

Design for Success

Performance of die-to-die links over silicon interposers is determined by the tight interaction between the PHY and the interposer itself. To ensure a successful design, the die-to-die link must be completely verified and use a comprehensive testbench that includes interposer channel and power distribution models as well as the PHY to capture the complete link with all the interactions.

Summary

The increasing amount of functionality and size is forcing designers to split SoCs for hyperscale data center, AI, and networking applications into smaller dies, creating the need for reliable die-to-die PHY IP solutions. However, depending on the packaging technology, designers have multiple PHY options to choose from, each with their own characteristics and advantages. Synopsys offers a portfolio of die-to-die PHY IP including High-Bandwidth Interconnect (HBI+) and SerDes-based USR/XSR. The HBI PHY implements a parallel architecture and targets applications leveraging silicon interposer-based MCM packaging technology. The HBI PHY is also compatible with the ABI standard. Synopsys’ SerDes-based PHY supports USR/XSR die-to-die connectivity at 112G per lane for packaging technologies using organic substrate of InFO (Integrated Fan-Out).