| Industry Insight|
Functional Verification from a Manager's Perspective
Ira Chayut, Verification Architect with nVIDIA, considers the most challenging question confronting functional verification: When is 'good enough' really good enough?
To answer the question "When is 'good enough' really good enough?" requires that we first answer: "Good enough for what?" The process of designing complex integrated circuits has many "good enough" thresholds, including when the design is good enough to:
- start running simple directed tests,
- run all directed tests,
- be synthesized for size estimates,
- be synthesized for FPGA prototyping or emulation,
- be taped out, and
- have silicon shipped to customers.
This paper will focus on the last two "good enough" points.
As far as time-to-market is concerned, a chip designer's priority is not to have integrated circuits tape out as soon as possible; rather, it is to have working integrated circuits shipped to customers as soon as possible. Early tapeouts can cause many problems, including respins, if critical bugs are missed during pre-silicon verification. On the other hand, problems also arise if time is spent on 'complete' verification: more resources must be used and the tapeout may be delayed. Verification follows Parkinson's Law, which states that the effort will expand to fill the time allotted to it.
Regardless of whether early tapeout or complete verification is the goal, there is a significant risk of missing a market window. So a solution must be found that balances the efforts expended and their associated coverage with the risks of performing a less thorough verification to accelerate the tapeout schedule.
A patchwork of diverse techniques must be applied to cover many of the voids that are in each of the tools. By applying a range of techniques, we may find that the whole is greater than the sum of the parts and increase the probability of shipping silicon - even in the presence of architectural, design, and verification errors.
Functional Verification Overview
Functional verification is the task of checking that a design implements a specified architecture, such as in Figure 1. The reference model represents the architecture and the same stimulus is applied to both the Design Under Test (DUT) and the reference model. Both outputs are then compared, either online or as a post-process. If the two outputs match for a thorough set of stimulus, then we can conclude that the DUT matches the architecture.
Figure 1: What is Functional Verification?
An alternative is to use self-checking tests - these follow the same outline as in Figure 1, although the reference model is built into the test itself. Thus, a separate reference model is not needed.
Figure 2 shows a more detailed example of functional verification. The DUT and the reference model implement a CPU and each can be a C model, an RTL model, an FPGA prototype, an emulation implementation, or silicon. The stimulus is a set of executable binary code. If the test is self-checking, then no separate reference model is needed. A self-checking test might load two registers with constants, execute an ADD instruction, and then compare the destination register with the expected sum of the constants. A conditional branch could then jump to one of two known locations, one to signify success and another to flag an error.
Figure 2: What is Functional Verification?
The concept of functional verification must be broadened in order to produce successful products. Poor performance may be considered a functional failure - take for instance an MPEG decoder that does not operate at frame rates, or a chip that performs redundant writes to memory despite passing all functional tests. In a highly competitive industry, such as CPUs or graphics, having significantly lower performance than other available products can eliminate your product's market, leave your company with a lot of scrap silicon, and remove the possibility of recouping the chip's development cost.
In functional testing, the test stimulus can be manually generated, captured from a live system, generated by a directed test program, or created by a pseudo-random generator. It is often useful to employ a combination of all these stimulus types.
In addition to comparing outputs to reference designs, check assertions can be used, as can formal and hybrid-formal techniques. Check assertions are ways of capturing a designer's assumptions and intent into the design code. They describe what are valid inputs, legal outputs, and proper design behavior. For example, an assertion can declare that a bus is one-hot, or that a sequence of packets is to arrive in order. Check assertions are run during simulation and will "fire" when the assumptions or intents are violated. For simple state machines and interfaces, it is possible to use formal verification techniques to prove check assertions are true, or to find violations, without running a single simulation cycle. Unfortunately, a thorough set of check assertions can be as costly to create as the RTL itself.
Note that the effort to write check assertions can be repaid many times over if the RTL blocks are reused. Since these assertions are included in the RTL code, they will assure that future designs incorporating the IP do not violate the designer's original intent.
To the extent possible, it is beneficial to run real applications in a testbench that mimics a full system. Usually, the scope of the application will need to be cut down to make this feasible. Even when emulation runs one million times faster than simulation, the application will still be running about one thousand times slower than the final silicon. Thus, small snippets of the real application and abbreviations of the full system need to be used for verification.
Why Is Functional Verification Necessary?
The goal of functional verification is to improve the odds that the manufactured silicon is good enough to ship, and the reason for this is financial: one critical bug can require a respin. While the cost of a new mask set exceeds $1M, that sum can be dwarfed by the costs of a lost market opportunity. There are other significant costs of incomplete verification, such as damaged reputation and customers' goodwill. Incomplete verification can also impact the schedule of the next generation of projects, as engineers must be brought back to fix problems that are found post-silicon. Not only is the next generation product impacted by the loss of staffing, but also the morale of the engineers is affected. Direct financial costs can also include an impact on a company's stock price. In 2005, a fabless semiconductor company scrapped $67 million of inventory "because it was slow getting to market". Its stock dropped 15 percent.
While functional verification is clearly important, it is possible to have too much of a good thing. Verification efforts currently account for 60-70 percent of a chip's total design effort. Increasing the amount of verification can mean delayed time-to-market and missing the market window. Spending more time on verifying a chip risks losing an edge that a company would otherwise have over its competition. 'Complete' verification also increases other direct costs: staffing, computer time, software licenses, and emulation resources.
Optimizing Functional Verification
Functional verification efforts should be optimized to minimize the associated costs and risks. Affordable functional verification should be maximized, while ensuring that multiple techniques are used in order to yield the broadest possible coverage. It is important to not depend on pseudo-random testing alone, because random stimuli does not guarantee that all the interesting corner cases are covered. For example, a long time ago, before formal verification, the author used a pseudo-random test harness to compare an RTL block with its handcrafted gate-level equivalent. After a weekend of running millions of test vectors, the two versions appeared identical. Unfortunately, the test stimulus never included an input case of all-zeros. This situation turned out to be critical corner case that would have exposed a missing wire on the gate-level version.
Thorough unit-level verification can be used to optimize the verification effort, because full-chip simulations run much slower than unit-level simulations. Also, a unit's inputs are much easier to control and its outputs are much easier to observe in a unit-level environment, as compared to a full-chip environment.
Loose coupling of units can also lessen the required verification effort. The use of well-defined interfaces between blocks allows the interface protocol to be verified with check assertions and possibly with formal verification techniques. By limiting interactions between blocks to high-level and well-defined transactions, verification time can be reduced - as the combinatorial explosion of block-to-block interactions is curtailed. The well-defined interfaces must be stable - changing the interfaces in the middle of the design effort greatly increases the possibility of introducing bugs. The use of standard interfaces, when possible, allows for existing in-house or commercial monitors and test suites.
Functional Verification Metrics
If we are going to determine when it is "good enough" to tape out or ship silicon, we need a way to measure something and then use that measurement as a "goodness indicator". There are a number of different functional verification metrics that can be employed.
Ideally, we would conduct exhaustive testing, where all possible inputs are presented when the DUT is in each of its possible states. Unfortunately, this is not feasible. For example, a two-input 32-bit adder would need 264 vectors to be fully tested, even though there is no internal state to consider. At 10 billion tests per second, testing would take 58 years. A modern complex integrated circuit would take much longer to test than the age of the universe. This would definitely cause us to miss our market window.
Error injection is also impractical. It is theoretically possible to inject random errors into the design code and see what percentage are caught by the regression test suite, but for the size of today's designs, it takes a very long time to run a regression test suite. Even with expensive emulation platforms, the number of runs that are needed to get statistically meaningful results with error injection techniques is not feasible.
Another common metric used as a gate to tapeout is the new bug rate. It is easy to measure and graph how many new bugs are filed each week. However, this metric can inadvertently pressure the design staff to accelerate the tapeout date and to stop looking for bugs. Unfortunately, this metric cannot predict the number of bugs that have not yet been found - as with your financial investments, past performance is no guarantee of future results. Figure 3 shows what might happen if a new class of tests is developed late in the verification process.
Figure 3: New Bug Rate - Past Performance is No Guarantee of Future Results
If check assertions are used in the design and verification flow, then the number of these assertions that are triggered can be measured and used as a tapeout metric. Since check assertions are expressions of the designer's intent and assumptions, then no check assertions should fire if a chip is to be considered ready for tapeout. Remember, however, that check assertions only fire when an error condition is seen - they do not provide any measure of test coverage.
Line and block coverage are also very common metrics. Third-party or built-in coverage tools are used to monitor a full regression to find lines or blocks of design code that are being run at least once. This has a slight impact on runtime. Visiting a line is necessary, but not sufficient, for complete testing. These metrics do not reveal which values of registers are exercised, how much of "short-circuit and/or" (&&/||) or ?: lines are exercised, or which combinations of conditions and DUT state are exercised. Nor do they say anything about missing lines of code.
Expression coverage is better than line coverage in overcoming some the above limitations. Although it has a significant impact on simulation runtime, like line coverage, it does not reveal which values of registers are exercised, which combinations of conditions and DUT state are exercised, or anything about missing expressions or code.
State machine coverage can be measured if all states and state transitions have been exercised. Unfortunately, for most designs, this can be labor-intensive because illegal (or legal) states and state transitions must be declared. Also, this metric does not tell us which states, transitions, or entire state machines are missing.
Lastly, coverage assertions - otherwise known as functional coverage - are used by designers and verification engineers to declare interesting events to be covered by test cases. Coverage assertions, along with input constraints, can be used by hybrid-formal methods to generate stimulus. The quality of the metric depends upon the size of the effort to declare interesting events and the quality of those declarations.
When using these metrics, the goal of a full regression suite run should be at least 98 percent line (or block) coverage, 95 percent expression coverage, zero check assertions firing, and 100 percent of the coverage assertions firing.
Living with Functional and Architectural Errors
Functional verification does not attempt to find architectural errors, but the end user - listening to his MP3 player or watching a movie on her set-top box - does not care if a problem is due to a functional or an architectural error. It is necessary to learn how to ship functional silicon with errors of either type. Successful companies have learned how to ship chips with functional and architectural errors - time-to-market pressures and chip complexity force the delivery of chips that are not perfect (even if that were possible).
For a long while, DRAMs have been made with extra components to allow a chip with manufacturing defects to provide full device function and to ship. How can the same be done with architectural redundancy? How can a less-than-perfect architecture or hardware implementation provide full device function?
There are a number of examples where problems in the native hardware can be worked around, even after the designs are shipped and running:
- Graphics chips have compilers in their device drivers that map the applications graphics commands into the native language of the graphics device.
- Transmeta maps the X86 instruction set to its own native language.
- Decades ago, Pyramid Technologies produced minicomputers and would not release the assembly language details to its customers, who were limited to programming in high level languages, such as C.
- Before that, IBM 360 was microcoded and the user's view of the assembly language was different from the underlying native machine language.
These examples show how a programmable abstraction layer between the real hardware and user's API can hide functional warts. Many problems in the shipped designs or architectures can be worked around by software changes that are transparent to the user.
Architectural redundancy is another solution for living with architectural and design errors. By providing multiple methods of performing critical operations and a software-controlled method of selecting the method used, it is possible to ship chips with one (or possibly more) of the methods impaired.
Ever increasing chip complexity prevents total testing. It is necessary to optimize (and reuse) verification infrastructure and tests. A growing functional verification toolkit is also needed, which includes simulation, emulation, and formal and hybrid-formal verification. Architectural and functional failures must be accepted in every advanced chip that is built. Architectural redundancy, software API layers, and soft hardware are needed to allow failures to be worked around or fixed post-silicon.
©2010 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.