Multi-die systems feature tiny microbumps that are placed close together, making testing via physical probing impossible. For example, for UCIe, microbumps are at a 25 to 55 micrometer distance, while probing distance is typically 90 micrometers. Instead, a better solution is to conduct electronic probing through built-in self-test (BIST). BIST can detect soft or hard errors requiring corrective action. Alternatively, dedicated wafer-based testing pads, integrated at the pre-assembly phase, can be used.
When the system is in development as well as in the field, a silicon lifecycle management (SLM) methodology that integrates sensors and monitors on the dies to assess various parameters, such as temperature, voltage, aging, and degradation, becomes useful. SLM IP technology integrated with analytics intelligence can turn high volumes of data collected from device sensors and monitors into actionable insights for system optimization.
Consider how SLM technology can identify thermal issues, which are a concern for individual dies and multi-die systems alike. Without real workloads, these are difficult issues to evaluate during the in-design phase. When you add in the complexity of a 2.5D or 3D architecture, then it’s really hard to know the thermal profile of the final design. Here’s a situation where SLM can help. On-chip monitors that are strategically placed on the die can open the door to analytics providing deeper insights into the thermal characteristics of dies and can signal the need to adjust placements to address heat dissipation. Similarly, knowing more about thermal effects might lead to a decision to slow down the data rates in the system’s High-Bandwidth Memory component. Or, there may be ways to mitigate heat dissipation via the software. With monitors providing data, designers can analyze and determine the best course for correction.
SLM technology also provides traceability—the ability to trace back to a root cause of an issue regardless of when in the lifecycle the end product exhibits an issue. For example, if a yield excursion is detected any time during the test manufacturing process, the ability to determine whether the problem stems from a certain wafer or die, across every wafer or die manufactured during a certain time period, or from the fab can be vitally important, especially in multi-die systems where the packaging costs can be very expensive. The faster you find the problem, the faster you can go to market and reduce your costs. A good SLM solution should be able to identify root cause within a matter of minutes, compared to manual methods that can take days or weeks.
Traceability also includes the case where the end product is already deployed in the field but starts to exhibit unexpected and potentially catastrophic failures, potentially requiring a recall. This return merchandise authorization (RMA) case can take advantage of SLM and the entire ecosystem of testing all the way back through manufacturing to identify root cause as well as “like” devices in the field that may still exhibit the same behavior, enabling the product owner to proactively recall devices before they fail or adjust the operating voltage or frequency of the devices to prolong their lives.
The last phase of testing is on the stack itself. Here, “known good system” is the operative phrase, as testing teams aim to determine whether their multi-die system will work well—and find ways to monitor, analyze, and fix issues when needed. IEEE Std 1838-2019 provides a modular test access architecture, enabling testing of dies and interconnect layers between adjacent stacked dies.
For stacked architectures, some testing needs to be pushed downstream, while more intelligent testing remains upstream in the process. For example, assessing for high temperatures at the die level isn’t feasible. Instead, temperature tests on multi-die systems are most effective when performed after stacking. Failures uncovered at this point can be fixed depending on their location. Temperature tests at the wafer level are also possible, though these can be rather expensive. Designers of high-end systems may opt to perform these tests. The ability to monitor and gather this important data gives design, manufacturing, and test teams the ability to make decisions on how to achieve the best quality of results.