Innovative Ideas for Predictable Success
      Issue 4, 2010

  NEWS  |   CALENDAR  |   PAST ISSUES SYNOPSYS.COM  |  CONTACT US


Technology Update Technology Update
Accelerating Analog Simulation with HSPICE Precision Parallel Technology
Until now, AMS simulators have struggled to make the most of multicore computing. That’s about to change. In this article, Robert Daniels, Harald Von Sosen and Hany Elhak, all Synopsys, explain how taking a precise approach to AMS simulation helps to remove the bottlenecks.

Increases in verification complexity are not only a problem for digital designers. Analog and mixed-signal (AMS) IC designers also face multiple issues that come together to put new demands on their simulation environments. Whether design teams have to cater for new interface standards, accommodate new semiconductor processes or optimize their designs for low power, the net result is invariably the same – a demand for more simulation.

Designing high-speed communications protocols while targeting advanced process nodes requires use of special design techniques. For example, designers often use digital control to configure analog blocks for multi-standard systems and to calibrate them for process variations. While this kind of approach delivers working silicon, it comes with added design cost. Verifying these circuits means more simulation runs at each process, voltage and temperature (PVT) corner. On top of this, engineers need to run more PVT-corner simulations for advanced process technologies, and maintain silicon-accurate results.

Multicore for Analog/Mixed-Signal
The digital design community has called on multicore processing to help boost simulation performance, and analog/mixed-signal can do the same. The new HSPICE Precision Parallel (HPP) technology in Synopsys’ HSPICE® circuit simulation tool delivers highly scalable performance on today’s multicore computers with up to 7x simulation speed-up for analog and mixed-signal designs. Design teams can use HPP to accelerate verification of their analog circuits across process variation corners, meet their project timelines and reduce the risk of silicon re-spins.

HSPICE Precision Parallel technology extends HSPICE gold-standard accuracy to the verification of pre- and post-layout complex analog circuits such as PLLs, ADCs, DACs, SERDES, and other mixed-signal circuits.

To speed up SPICE simulation without compromising accuracy, we need new simulation algorithms that take advantage of modern computer architectures. Such computers consist of multiple processor cores sharing the same memory and use integrated cache to reduce memory communication. Traditional simulation algorithms based on sequential computations cannot benefit from these modern computers. HPP uses new algorithms that parallelize a larger percentage of the simulation without compromising accuracy. Additionally, managing memory efficiently allows users to simulate post-layout circuits that are larger than 10m elements.

Challenges in Accelerating SPICE
A typical SPICE simulation analyzes a circuit across a large number of time steps. Each time step consists of multiple iterations, which can be broken down into two major tasks:

  • Evaluating the devices in the circuit and loading them into a matrix
  • Solving the matrix to calculate voltage and current at each node

The iterations continue until the circuit converges, then the simulator moves to the next time step and repeats the same process. The percentage of simulation time spent in evaluating the devices and solving the matrix is dependent on circuit type. The key to accelerating SPICE on a multicore CPU is to be able to parallelize as much of each individual task as possible, without sacrificing accuracy.

Device evaluation dominates simulation activity for small pre-layout circuits. This may take up to 75% of simulation time, and it increases linearly with circuit size. Traditional SPICE simulators distribute this task on multiple CPUs, achieving a modest level of parallelization.

If more than a third of the simulation is left in sequential tasks, according to Amdahl’s Law, a verification team can expect only a 2-3x speedup using 8-core machines. Figure 1 shows how performance flattens off when multi-threading a hypothetical circuit on 2-, 4- and 8-core machines.


Figure 1: Theoretical limits of parallel SPICE simulation (Amdahl’s Law)

On large, post-layout circuits where device evaluation represents less than half the simulation time, it’s even harder to scale the computation linearly with the number of cores. Solving the matrix can consume more than 50% of the simulation time for large post-layout circuits. Simulators can achieve significant scaling by parallelizing this task, however, solving a sparse matrix (which is typically the matrix form for electronic circuits), involves a lot of sequential activity. Even with 90% efficiency, Amdahl’s Law predicts a theoretical speedup of 3x on 8 cores, as shown in Figure 1.

To obtain highly scalable computations, the parallel efficiency of the underlying code must be very close to 100%, as shown in Figure 1.

Another simulation challenge is that the order of data processing is not particularly cache-efficient. Cache efficiency is important for good multicore performance because processors compete for cache and memory access.

HSPICE Precision Parallel Technology
HPP applies several new algorithms to the transient analysis problem. This includes improving single-thread computation, implementing a highly scalable approach to take full advantage of multicore machines, and, finally, optimizing the memory management with a compact memory footprint and efficient use of cache. Synopsys has implemented these improvements in HPP while maintaining full HSPICE accuracy.

Single-Core Speed
Modern analog circuits consist of components that operate at different time constants. For example, a PLL consists of a voltage-controlled oscillator and divider operating at a high frequency, while other circuit components such as the phase detector, filter and digital control circuitry operate at much lower speed.

HPP can use this speed difference to its advantage to produce speed improvements when using a single core. The adaptive sub-matrix algorithm in HPP technology manipulates the matrix in such a way that slower parts of the circuit can be solved using fewer iterations than the faster ones, significantly improving the overall simulation speed. Figure 2 shows the average HSPICE speed improvements over the past three years. The HPP technology in the 2010.12 release delivers an average 40% speed-up over the previous release.


Figure 2: Single-core speed improvement in HSPICE

Multicore Scaling
HPP technology uses an adaptive sub-matrix algorithm – a highly scalable algorithm that divides the matrix solving stage into smaller tasks, which the simulator can map efficiently to multiple CPU cores. In addition, HPP parallelizes other small tasks such as output and time step control to achieve parallelization efficiency close to 100%. The result is up to 7x scaling on 8 cores, a significant improvement in scaling over the previous release. Figure 3 shows HSPICE multicore scaling on representative analog/mixed-signal circuits such as Sigma-Delta data converters and PLLs.


Figure 3: HSPICE multicore scaling on representative analog/mixed-signal circuits

Cache Efficiency and Memory Bus
Even with code that is almost 100% parallelized, the potential to scale simulation performance is limited by cache misses and the finite time required to move data between cache and main memory. Exactly how cache misses and data movement affect the overall scaling depends on the machine parameters, for example: cache size, memory bus speed and the efficiency of the code in localizing and minimizing data. HSPICE uses very efficient localization of data in blocks that are comparable in size to the highest-level cache. The variable controlled by the user is the multicore architecture on which HPP is run. Generally, the larger the cache (second- or third-level) and the faster the memory bus, the better the performance and scaling.

The attention to memory efficiency also provides the benefit of high capacity. HPP is capable of simulating post-layout circuits in excess of 10m elements and 9m nodes. Figure 4 demonstrates HSPICE capacity improvements over the past three years. The new HPP technology improves capacity by an average of 25% over the previous release.


Figure 4: HSPICE capacity improvement from 2007 to 2010

Conclusion
HPP achieves high performance on multicore machines by removing a bottleneck that slows down traditional multi-threaded simulations. It makes the most of the scalability of today’s multicore architectures, with the best performance coming from machines with the largest second-/third-level cache and fastest memory bus. Efficient memory management allows simulation of post-layout circuits larger than 10m elements.

In addition to the new HPP technology, the HSPICE 2010 solution includes enhanced convergence algorithms, advanced analog analysis features and foundry-qualified support for process design kits (PDKs) that extend HSPICE gold-standard accuracy to the verification of complex analog and mixed-signal circuits. With HSPICE 2010, design teams can accelerate verification of their analog circuits across process variation corners, and minimize the risk of missing project timelines and having to re-spin silicon.

About the authors

Robert Daniels is senior staff engineer at Synopsys.
Harald Von Sosen is principal engineer at Synopsys.
Hany Elhak is product marketing manager at Synopsys.


©2010 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.


Having read this article, will you take a moment to let us know how informative the article was to you.
Exceptionally informative (I emailed the article to a friend)
Very informative
Informative
Somewhat informative
Not at all informative

Register Buttom

Email this article

WEB LINKS
- HSPICE 2010

"HPP achieves high performance on multicore machines by removing a bottleneck that slows down traditional multi-threaded simulations."