Five Key Techniques to Accelerate Software Bring-Up for Multi-Die Systems

Filip Thoen, Leonard Drucker, Vivek Prasad

Sep 12, 2023 / 6 min read

In today’s hyper-competitive chip development environment, chipmakers aspire to ship silicon as soon as it arrives in the lab. More precisely, they measure the span of time between the arrival of silicon in their labs and shipping a product to their customers in hours, not days or months. This means the entire system—silicon plus all the software that runs on it—must be ready to perform as intended.

Software has become an increasingly integral component of today’s electronic systems. From virtual reality (VR) headsets to highly automated vehicles, software-driven systems depend on sophisticated algorithms that make capabilities from immersive experiences in the metaverse to advanced driver assistance systems a reality.

Software content in advanced systems-on-chips (SoCs) and multi-die systems is exploding, making its development complex and time-consuming. Software developers also can no longer wait until the hardware is available before starting software development; these tasks must be done in parallel to meet time-to-market goals. Software bring-up, which ensures the software is indeed fully functional, correctly integrated with and optimized on the targeted silicon, and good to go, is a critical step in terms of overall system quality and performance.

What is the fastest and most effective way to bring up software given these challenges? And how do massive multi-die systems affect the process, given that each heterogeneous die can be a system on its own, complete with an entire software stack, and all must work together to deliver the function of the multi-die system? In this blog post, we’ll highlight how multi-die systems, with all their complexities and interdependencies, create new challenges for software bring-up, and how technologies including virtual prototyping and electronics digital twins can help you overcome these challenges.

software bring-up multi die systems

What’s Needed for Software Bring-Up in Multi-Die Systems?

Multi-die systems have found their way into the semiconductor mainstream, providing an avenue for faster scaling of system functionality with reduced risk and time to market. For compute-intensive applications such as AI, ADAS, hyperscale data centers, and high-performance computing, multi-die systems have become a system architecture of choice.

Comprised of multiple dies, also called chiplets, in a single package, multi-die systems meet demands for bandwidth and power driven by applications requiring growing functionality. By providing a modular architecture, they also enable chip designers to address the needs of multiple end market opportunities through faster development of product variants.

Compared to their monolithic counterparts, multi-die systems bring more layers of software complexity, even from an engineering resource perspective. Monolithic SoCs are typically developed by a team that fits under a particular organizational structure that follows a pre-determined schedule. Since multi-die systems are comprised of individual dies that might be on various foundry process nodes, you can imagine why the traditional flow won’t work here.

From a technical standpoint, the evolution is such that a system from a decade ago that would have consisted of multiple PCBs can now be implemented as a single multi-die system. The growth in software complexity is obvious. Aside from bringing the software stack up on the hardware of each die, the entire system needs to be integrated and tested. Data moves across different dies over complex die-to-die connectivity interfaces such as Universal Chiplet Interconnect Express (UCIe), which are often based on PCIe or CXL. These interfaces require correct software setup and programming. The picture gets particularly complicated with the inclusion of software hypervisors, comprising hardware interfaces that are shared using complex virtualized functions and para-virtualized software drivers, such as the single root input/output virtualization (SR-IOV) standard. These setups are hard to develop and debug in a singular PCIe host /device setup, let alone in a complex multi-die system containing multiples of these interfaces. Debugging is challenged both technically and organizationally: debug visibility is limited due to the limited amount of debug interfaces for a now-growing chip functionality, while debug might require connecting teams from different companies to understand and together determine the best way to approach this task.

Ensuring that the end application will perform correctly involves testing the software running on the chip for more than tens of billions of cycles on an emulation system prior to production. Any decisions around the hardware components in a multi-die system should not be made without consideration of the software. Software continues to evolve, so the testing process also entails a continuous effort. 

Key Techniques for Faster Software Bring-Up

When bringing up software on the new hardware that’s being designed, the intent is to match the impedance of the software with that of the targeted device. In other words, the software must be in tune with the hardware for correct function and optimal system performance. This process involves low-level initialization of the chip by a dedicated controller and bring up of the main operating system. The software is really an entire stack with multiple layers that could include middleware that provides services, device drivers, or elements that communicate with the outside world (like a USB or Ethernet controller).

There are a variety of pre-silicon techniques that can be deployed at various abstraction levels to achieve faster time to market and higher quality software. Here are five examples:

  • Virtual prototyping, where engineers create a transaction-level model, essentially a C++ equivalent of the chip being designed that is typically expressed in RTL. This allows a shift left in the process and supports bootstrapping. Virtual prototyping is scalable across a software development team. Virtual prototyping models resemble real-world devices to the operating system on the host machine. Virtual models can connect to real physical interfaces, such as PCIe, USB, and Ethernet, and RTL models on an emulator or prototype, thus enabling virtual prototypes to interact in a real-world environment and provide real-world stimulus for certain types of tests.
  • Emulation and physical prototyping systems used in sequence with the RTL, which can provide accuracy but at lower speeds than virtual prototyping. This approach requires mature RTL and IP descriptions. By applying fast physical prototyping downstream, developers can interface their software with physical devices to validate how the chip will operate in the real world.
  • Hybrid prototyping, which brings together the strengths of virtual and physical (i.e. FPGA-based) prototyping for earlier software development and system integration. Hardware and software engineers can improve debug visibility and control of their software under development, maximize prototyping performance, and accelerate system bring-up.
  • Electronics digital twins, which provide a virtual representation of a system under development to enable engineers to analyze and optimize their design before it is finalized and accelerate software bring-up, power analysis, and software/hardware validation.

In addition to accelerating software bring-up, these pre-silicon techniques also provide a way for designers to measure the key performance indicators (KPIs) of their chips. Being able to validate performance and power early on—and, therefore, confirm what is being promised to their customers—reduces project risks and rework needed, and assures release to production date and subsequent chip sales.

Multi-die system design teams may find that a mix of different verification engines at different levels of abstraction might be needed, based on the demands of the software tasks at hand. Because software is diverse, it may be worthwhile to start with a virtual prototyping engine, followed by a combination of virtual and physical prototyping/emulation (called hybrid prototyping and hybrid emulation, respectively) to support the software developers’ efforts, as well as provide the needed capacity for these large multi-die systems. For example, some of the system’s pre-silicon software could be run on an emulator while other software components are abstracted via virtual prototyping or modeling. Toward the end of the design cycle, the software running on the hardware—the entire system—will need to be verified to ensure that it meets performance targets.

Indeed, a mixed approach may be most effective for multi-die system software bring-up. Once all of the individual components are verified, the entire multi-die system can be run on a high-capacity emulator such as the Synopsys ZeBu® emulation system, which supports up to 30 billion gates. Running emulation in the cloud via a solution like Synopsys Cloud also provides greater capacity for the multi-die systems, along with the flexibility to tap into emulation resources based on peak demand periods. The ZeBu emulation system is part of the Synopsys verification family of products, which includes the Synopsys HAPS® prototyping system and Synopsys Virtualizer™ virtual prototyping solution. Together, the family delivers the capabilities needed for early software bring-up and system validation. With the continued integration of AI into electronic design automation (EDA) flows, there’s potential down the road for machine-learning algorithms to play a role in software bring-up and aiding with root-cause analysis during software debug.

Summary

Thanks to software-defined systems, we have VR headsets that take us on journeys through far-flung lands, cars that can park and drive themselves, AI-driven smart speakers that can control our home’s lighting and HVAC systems, and so much more. To ensure that software and hardware in a system are co-optimized to deliver on the function and performance targets for the end application, a variety of pre-silicon techniques discussed above can help accelerate software bring-up for multi-die systems. Verification solutions such as emulators and (virtual) prototyping systems and their hybrid combinations, support these techniques, helping design teams deliver the smart, connected electronics that are enhancing our lives. 

Continue Reading