Insight into the OpenHBI Die-to-Die Standard

Manuel Mota, Srenior Staff Product Marketing Manager , Synopsys

Introduction

When designing multi-die SoCs, system architects are faced with multiple design choices and tradeoffs. Perhaps the most fundamental is choosing the best SoC packaging technology option:

  • 2D packaging, where dies are assembled on an organic substrate and laminate
  • 2.5D packaging, where an interposer using silicon or redistribution layer (RDL) fanout is used to route the signal between dies in the SoC
  • 3D packaging, where dies are stacked vertically using hybrid bonding techniques
  • Combination of the above

Figure 1 illustrates the various packaging options that are available to designers. 

Figure 1: Packaging options

2.5D packaging with RDL fanout technologies are emerging as an attractive option due to its ability to bridge the low cost of 2D technologies and the density of silicon interposers. RDL fanout is available from foundries as well as traditional Outsourced Semiconductor Assembly and Test (OSAT) providers, thus easing accessibility and potentially further reducing cost.

High-end multi-die SoCs for high-performance computing (HPC) applications, such as data center, artificial intelligence (AI) training or inference, servers and networking, take advantage of the 2.5D packaging technology’s density and RDL fanout in order to use parallel die-to-die interfaces.

Why Parallel Die-to-Die Interfaces?

Parallel die-to-die interfaces fundamentally assemble a large number (up 1000s) of IO pins that drive single-ended signals across dies. Because the data rate per pin is only a few giga bits per second (Gbps) from 8 to 16 Gbps and the distance between dies is only a few millimeters from 3 to 5mm, the drivers and receivers can be simplified, while achieving system Bit Error Rates (BER) well below the 1e-22 to 1e-24. The system BER can be met without requiring additional error correction mechanisms such as forward error correction (FEC) and retry, which add link complexity and latency.

By simplifying the IO, eliminating the serialization and deserialization (SerDes) steps and avoiding very-high-speed signaling, parallel die-to-die interfaces can achieve remarkably high power-efficiency and low-latency, while supporting very high throughput across the link. For these reasons, parallel die-to-die interfaces are very appealing to SoCs for high-performance computing applications that are not tightly constrained by packaging cost and assembly.

Die-to-Die Interface Standards

Industry has deployed many proprietary architectures for parallel die-to-die interfaces. However, the multi-die SoC market aims to develop a robust ecosystem where dies implementing different functionality (a.k.a. chiplets) can be developed by and interoperate with different vendors.

Because of this reason, the industry is developing standards that offer different characteristics for parallel die-to-die interfaces in advanced packaging such as (silicon interposers, silicon bridges or RDL fanouts). Table 1 shows a comparison of the main characteristics.

Standard

Data Rate [Gbps]

Bump Space [um]

Power Efficiency 

[pJ/bit]

Edge Density [Tbps/mm]

Area Density [Tbps/mm2]

FOM-1 [Tbps/mm / pJ/bit]

Larger is better

FOM-2   [pJ/bit / mm]

Smaller is better 

AIB 2.0

6.4

55

0.5

1.64

-

3.28

0.1

OpenHBI 1.0

8

40

0.4

2.29

2.04

5.71

0.1

OpenHBI 2.0

12~16

40

0.5

3.34

3.06

6.86

0.06

BoW – Basic

8

40

0.5

1.78

1.07

3.56

0.1

Table 1: Parallel die-to-die interface standards for advanced packaging (courtesy of OCP Tech Week, Nov 2020)

A similar version of the standards supports organic substrates, enabling system design abstraction from the physical layer or package type, as shown in table 2. However, the bump pitch and package routing density for advanced packaging is much higher, leading to better form factors and edge efficiency at similar or better power efficiency. All of these are critical metrics for HPC and networking applications requiring very high die-to-die data throughput.

Standard

Version with Support for Organic Substrate

BoW

BoW-Fast, BoW-Basic (C4)

OpenHBI

OpenHBI-L

AIB

AIB 2.1 (Laminate)

Table 2: Version of parallel die-to-die PHY standards for support of organic substrates packaging

OpenHBI has emerged as the standard that provides the highest edge density, making it ideal for applications that must transmit very high bandwidth between two dies.

What is OpenHBI?

OpenHBI reduces risk by leveraging JEDEC’s HBM3 electrical characteristics and IO types. It uses low voltage and unterminated single-ended DDR signaling to convey the data between dies.

The OpenHBI standard has many key characteristics:

  • Integrates OpenHBI-compliant die-to-die interfaces to allow interoperability
  • Leverages JEDEC HBM3 IO types and electricals
  • Supports silicon interposer and wafer-level integrated fanout or equivalent technologies
  • Enables symmetrical die-to-die interface
  • Achieves target speed of 8Gbps per pin with roadmap to 12 - 16Gbps
  • Provides up to 3mm trace reach at maximum data rates
  • Achieves power target of less than or equal to 0.5pJ/bit
  • Provides linear (beachfront) bandwidth density greater than 1.5T bits-per-millimeter (aggregated transmitter and receiver)
  • Defines PHY and logical PHY abstraction layers to ease adaptation to upper layer
  • Supports normal and rotated die orientation
  • Scales bandwidth and beachfront (number of DWs) to match various use cases 
  • Supports the Chiplet Configuration and Test (CCT) interface
  • Supports lane repair to enhance manufacturing yield

The OpenHBI standard is focused on the lower layers (PHY and logical PHY layers) as shown in Figure 2. An adapter layer is then used to interface with the upper layers (protocol layers). As a result, the system implementation becomes independent of the protocol used for each application.

Figure 2: OpenHBI interface logical partition

Figure 2: OpenHBI interface logical partition

The PHY layer executes the following functions:

  • Clocking
  • Gearbox (data rate conversion N:1)
  • Calibration and training
  • Lane repair
  • Data transmission and recovery

If required, the logical PHY layer executes the following functions:

  • Parity generation and check
  • Data framing and alignment
  • Data bus inversion
  • Bit reordering

Figure 3 illustrates a possible OpenHBI PHY implementation, where different functions can be partitioned into the implementation.

Figure 3: OpenHBI PHY IP block diagram

The OpenHBI PHY uses a similar DWORD-based data path organization as HBM3. Each DWORD is made of 42 data signals and additional signals plus 2 redundant lanes (used for lane test and repair) and differential forwarded clock, as shown in Table 3.

Symbol

Description

TX side

RX side

D<41:0>

Data

Out

In

WDQS p/m

TX forward clock (diff)

Out

-

RDQS p/m

RX forward clock (diff)

-

In

RD<1:0>

Redundant lanes

Out

In

Table 3: DWORD signal description

The data signal input or output configurations depend on the side that DWORD is operating - transmitter or receiver. Some of the data pins are dedicated to OpenHBI PHY-specialized services such as data bus inversion (DBI) (a power saving and noise reduction function), parity (a simple error detection function) and framing (a data alignment function). OpenHBI supports each service independently, as shown in Table 4. If services are not used, then the freed-up pins can be re-used by upper layers for data transfer.

Bits

All

DBI

Fram + Par

Framing

Bypass

Mode

0

1

2

3

4

Payload

36

38

40

41

42

DBI

4

4

0

0

0

Parity

1

0

1

0

0

Framing

1

0

1

1

0

Table 4: OpenHBI Payload vs services enabled

The DWORD also manages training, test and repair procedures for its own pins.

The PHY uses a clock forwarding technique where the transmit clock along with the data is also transmitted between dies. A simple DLL-based data recovery circuit on the receiver side saves power and area.

In addition to the payload data path, the PHY also implements a low-speed CCT, which is used by the anchor and chiplet to communicate configuration and status parameters as well as control DWORD initialization, calibration and testing procedures. The OpenHBI PHY implements I3C, JTAG and vendor-specific signals.

In addition, CCT propagates the reference clock from anchor to chiplet die, so that they can share the same clock reference.

Other key features of OpenHBI PHY are:

  • Configuration port with APB/TDR interface for access to internal control and status registers (CSR)
  • Configurable PHY that supports a variety of DWORD number to adapt the specific use case
  • Comprehensive testability for naked die testing (Known Good Die) and post assembly testing, including critical blocks BIST, a variety of loopback modes, pattern generation and matching capabilities, and the ability to generate reconstructed eyes for pass/fail testing, as shown in Figure 4

Figure 4: Eye diagram for a die-to-die link using Synopsys HBI+ PHY

Conclusion

Designers have options to choose the die-to-die interface that best suites their design needs. SerDes-based or parallel-based die-to-die interfaces each provide their own unique advantages such as data rate, number of pins and cost. Designers must also choose between the different packaging technologies such as 2D, 2.5D, 3D or a combination in which to integrate their multi-die SoCs. Parallel die-to-die interface has emerged as the technology of choice for high-performance computing SoCs that are not constrained by packaging cost and complexity. To maintain a successful ecosystem, the industry is aiming to develop standards for multi-die SoCs from different vendors to interoperate successfully. One of those standards is OpenHBI which can achieve speeds of 8Gbps per pin, provide up to 3mm trace reach at maximum data rates, and achieve power targets of less than or equal to 0.5pJ/bit.

Synopsys offers a portfolio of die-to-die IP including 112G SerDes-based PHY and controller as well as High-Bandwidth Interconnect (HBI). The DesignWare® HBI PHY IP supports many standards including AIB, BoW and OpenHBI.