Parallel die-to-die PHY architecture addresses the challenges of die-to-die links routed over silicon interposers. They leverage high-density routing to implement a very high number of simple, low speed I/Os that can achieve high aggregate bandwidth required in an efficient way. Similar to high-bandwidth memory (HBM) interfaces, parallel die-to-die links aggregate up to 1000s of pins, each transmitting data at a few Gbps. For example, if each pin can reach a data rate of 4Gbps unidirectionally, then the PHY needs 500 transmit pins and 500 receive pins to achieve a total aggregate bandwidth of two terabits per second (2Tbps bidirectional).
For the parallel-based die-to-die PHY to be effective, it needs to implement the following key principles:
Simplicity and Scalability
Given the large number of signal pins required for a parallel link, each driver and receiver relies on a simplistic architecture to be very energy- and area-efficient. They implement clock forwarding techniques to reduce the complexity of the data recovery architecture on the receive (RX) side by swapping complex clock and data recovery (CDR) with phase aligners that are supported by delay locked loops (DLLs). On the transmit (TX) side, equalization and training can also be simplified, leveraging the short channels and the low data rate being transmitted.
Additional architectural simplification is achieved by grouping the TX and RX data pins into small groups, each sharing a common circuitry (for power and area efficiency) and including all the circuitry required for their operation. These groups are called Channels.
It is possible to scale the PHY to efficiently support links with different bandwidth (BW) simply by assembling the correct number of channels to achieve the required BW.
Energy efficiency in the range of less than 1pJ/bit can be achieved with these techniques.
Maximizing beachfront efficiency is achieved with single-ended signaling, which reduces the number of pins and traces on the substrate by half.
Single-ended signaling is inherently more susceptible to crosstalk than differential signaling, however, the signals' relatively low data rate and high voltage swing mitigate noise and crosstalk concerns. Nonetheless, the complete interconnect bus design, including the TX and RX drivers as well as receiver and interposer traces, should be thoroughly validated for crosstalk to ensure the connection is robust.
Parallel die-to-die interfaces have 1000s of fine-pitched traces, making them susceptible to silicon fabrication process impurities with potentially catastrophic impacts on the yield of the link and of the MCM.
To maximize yield, the parallel die-to-die PHY includes redundant lanes distributed per channel, lane testing capabilities, and circuitry to re-route signals from lanes that are identified as defective to the redundant lanes, as shown in Figure 2. This makes it possible to repair the link and maximize yield.