Key Considerations for Addressing Multi-Die System Verification Challenges

Arturo Salz, Dr. Johannes Stahl

Aug 01, 2023 / 5 min read

Multi-die systems are quickly becoming the architecture of choice for hyperscalers, developers of automated vehicles, and mobile designers. As a single package with heterogeneous dies, or chiplets, these systems are providing an avenue for lower power and increased performance for compute-intensive applications in an era when the gains due to Moore’s law are slowing down.

While developing multi-die systems can follow similar verification processes as monolithic SoCs, every step must be considered from a single die to a system perspective. Does this mean that verifying multi-die systems is harder? There are unique challenges, to be sure, but with the right framework, flow, and technologies, these challenges can be overcome.

Read on to learn what you need to know about multi-die system verification. You can also gain additional insights by watching our on-demand, six-part webinar series, “Requirements for Multi-Die System Success.” The series covers multi-die system trends and challenges, early architecture design, co-design and system analysis, die-to-die connectivity, verification, and system health.

chip verification multi-die systems

Three Key Challenges for Multi-Die System Verification

Multi-die systems may consist of:

  • A 2.5D package containing interposer-mounted chiplets
  • A 3D stack with regular structures, such as memory and FPGAs
  • Heterogeneous stacks mounted on interposers/bridges
  • A recursive composition formulation, which are essentially stacks of stacks in which the system is partitioned to balance throughput and energy

Each of these configurations represents a combination of independently manufactured dies interconnected through communication fabrics, which enable designs with a massive size. There’s also an array of new components to contend with, including bumps, micro-bumps, through-silicon vias (TSVs), interposers, and interconnect bridges. Given the increased scale and complexity, the incremental refinement design flow typical for monolithic SoCs will not work here. After architecture design of monolithic SoCs, teams typically write the RTL and tests, find issues, and possibly change the architecture, going back and forth between these steps as needed. Then it’s time for synthesis, timing analysis, more changes, power estimation, more changes, and so on. These incremental refinement steps continue as the design moves closer to becoming a physical chip. Such a flow isn’t possible with a multi-die system architecture because the dies are already manufactured and all components must be verified from a system-level perspective. Instead, the concept of system-level aggregation needs to be integrated into the flow.

Let’s highlight three key challenges in verifying multi-die systems, along with what’s needed to address them:

  1. System verification must validate assumptions made during the architecture design, considering parameters including die-to-die communication, delay, jitter, coherency, power, guaranteed delivery, and errors. By contrast, delay is the only consideration for monolithic SoCs. This challenge with multi-die systems can be eased by adopting standard die-to-die interfaces such as Universal Chiplet Interconnect Express (UCIe) IP along with verification IP.
  2. Design size and complexity exacerbate verification. Scalable simulation and emulation models along with a system integration methodology can provide the capacity and performance required.
  3. Knowing when verification is complete can be tough to determine. Die-level bugs cannot be fixed at the system level, so exhaustive verification of individual dies with comprehensive functional coverage is necessary. This allows the system-level verification to focus on scenarios using an explicit coverage model that, for example, ensures that data arrives at the right place and with the expected throughput and latency. 

Getting a Head Start with a System-Level Verification Approach

As a best practice, design teams must model, lay out, and verify their multi-die designs in the context of the system. With this approach, many design aspects and optimizations—from horizontal and vertical partitioning and placement to die-to-die communication, power, and thermal considerations—become architectural decisions. Much of the work must be performed early on, when changes are still feasible in terms of optimizing the design. A framework for end-to-end co-exploration and co-optimization of technologies, architectures, and algorithms can provide a pathway for architectural exploration that allows for quick estimates of PPA for a range of workloads.

When it comes to verifying a multi-die system, however, once each of its individual blocks is designed and verified, and the system is assembled, the system must then be verified as a whole. The flow for this could apply a modularized approach, much like board-level verification.

Multi-die system verification should thus focus on:

  • Complex functions spanning multiple dies
  • Performance that is a function of multi-die functionality
  • Functional scenarios

A basic functional test is to assemble and simulate the RTL of all the dies in the system. How do you approach this when there could be compile issues for simulation (which require you to avoid name clashes) and capacity implications (in which the compute-server may not have enough memory to build and execute the simulation)? Can die-level testbenches be reused and/or synchronized? Can the simulation be distributed over multiple servers?

When assembling the multi-die system for simulation, having a single executable to simulate system aggregation can be an efficient and effective method. However, simply compiling all the dies together will likely trigger name clashes. What if you could analyze each die in a separate library? Based on this approach, the same names (or module) could be used for multiple dies, without name clashes. The system assemblage should require only the top-level assembly and configuration files, requiring no changes to per-die code. 

Addressing Capacity and Performance Concerns

To accelerate heterogeneous integration of dies, the comprehensive Synopsys Multi-Die System Solution includes electronic design automation (EDA) and IP products for early architecture exploration, rapid software development and system validation, efficient die/package co-design, robust and secure die-to-die connectivity, and enhanced manufacturing and reliability. For multi-die system verification, two key components of the solution are Synopsys Platform Architect™ virtual prototyping solution and Synopsys VCS® functional verification solution. The Platform Architect solution supports virtual prototyping for early architecture exploration and early software development and hardware performance verification. The VCS solution, which features the industry’s highest performance simulation and constraint solver engines, enables a shift left in verification flows early in the design cycle. A new capability in the VCS solution enables distributed simulation of multi-die systems, addressing verification capacity and scalability concerns by enabling a large simulation run to be broken up into smaller parts via multiple participating executables. One Synopsys customer reported that simulation of a multi-chip GPU was completed 2x faster via distributed simulation compared to their legacy approach.

Capacity of simulation and emulation systems does become a concern in the multi-die system world, as all the dies and memory in a system won’t fit on a single compute server. Distributed simulation addresses this. The modular, billion-gate Synopsys ZeBu® emulation system provides the capacity needed to verify the entire system in a scalable and extensible manner, while hybrid and cloud-based emulation offer additional methods to reduce capacity constraints and  provide higher throughput.

As a final step in the verification flow, the entire multi-die system needs to be connected to real-world testers, or at least to virtual testers that represent real-world operation. Only then can the system be fully validated. Synopsys provides models, transactors (including virtual testers), and speed adapters that can be used with our emulators to accelerate system validation. 

Summary

Multi-die systems have emerged to accelerate scaling of system functionality, reduce risk and time to market, and facilitate the creation of new product variants. Because these systems bring new levels of interdependency among their components, verifying them takes on new complexities. Each of the dies must be thoroughly verified on their own and so, too, does the entire system as a whole. Technologies facilitating virtual prototyping, distributed simulation, high-capacity emulation, and accelerated system validation are all important tools in the toolbox to help ensure that your multi-die system delivers the functional correctness that you’ve intended. 

Continue Reading