Chip design complexity is already growing by leaps and bounds, and the semiconductor industry is facing a slew of high-profile challenges. From the march to angstroms to multi-die integration and rapid node migration, there has never been a greater need to find innovative solutions while raising engineering productivity. Most SoCs, however, require a costly respin, largely due to logic and functional issues. Because of this, there can never be enough SoC verification…yet, cost and time-to-market pressures prohibit an endless verification and debug loop.
The verification process kicks off once the RTL for a chip design is set up and the design state space gets configured. Chip verification engineers need to check each of these spaces to ensure that the final SoC design will work. The goal behind coverage closure is to ensure that the entire design will work functionally as it is supposed to.
There are three main challenges for coverage closure:
- Planning for coverage, as it is challenging to know what to write in the coverage definition for the testbench (what types of coverage groups are needed, where are the gaps, what still needs to be written, etc.). This is essential to ensure that 100% coverage indeed means you have found all the bugs.
- Closing coverage, as it is difficult to know which tests contribute the most to coverage. You might run the same test 1,000 times only to achieve 50% coverage. As you get closer to 100% coverage, you might find that closing those last few percentages can take a few weeks. Targeted tests are key here, but these are very labor-intensive to develop.
- Stimulus development and root-cause analysis, as you may encounter scenarios where the stimulus isn’t supposed to exercise a particular configuration or a bug. Perhaps the stimulus was written in a way that won’t hit the coverage target at all.
In a traditional chip verification cycle, verification engineers will set a target and run their regression environment. As part of the process, the engineers set up testbenches to generate random stimulus to see how the design responds. It’s not uncommon to have 10,000 to 15,000 tests for a given design, and the verification team usually doesn’t have a sense for the ROI of each test. Regressions could run for days, taking up valuable compute resources.
There are two iterative loops that take a bulk of the time in the SoC verification cycle: debugging failures and fixing bugs after running regressions and coverage closure (Figure 1). Both consist of time-consuming, iterative work, involving coverage analysis, making adjustments after discovering holes in the coverage, and doing it all again…and again…and again. Then, when teams discover failures, they need to analyze them, make changes in the RTL or the testbench, and re-run the regressions to ensure that the bugs were actually fixed. This part, too, is an iterative loop.
Also, it’s not uncommon for the last bit of the coverage closure process to be the most laborious. A thorough manual analysis of the huge amount of data that this whole process generates is not really feasible, so teams are generally left needing more insights into root causes of chip design bugs.