| Technology Update|
Faster Signoff in Multicore Environments
Different chip design teams have different compute environments. Bernadette Mortell, senior product manager for PrimeTime® suite, Synopsys, explains how the latest multicore development in PrimeTime makes the most of all of them.
PrimeTime is now unique among timing tools: it lets designers get the most out of their multicore compute platforms by allowing them to run jobs that use both distributed and threaded processing.
The threading capability in PrimeTime is the latest step in the multicore initiative we have rolled out for our Galaxy™, Discovery™ and DFM solutions. This initiative aims to help designers to deliver chips on time by speeding up the design and verification steps for complex designs.
PrimeTime Multicore Solves Multiple Problems
Designers doing signoff analysis face many challenges, several of which result in longer runtimes. Designs are getting larger, some doubling in size from one generation to the next. The number of signoff scenarios required for signoff is increasing as designs move to smaller geometry process nodes, and the timing analysis required is becoming more complex as new methods to reduce design margin are added to the signoff flow.
Even though chips are getting bigger, the size of design teams – and the number of people involved in signoff – hasn’t actually increased that much. This means that design teams have to become more productive with each design they tackle, all the while dealing with design teams which have become geographically dispersed across continents and time zones.
The PrimeTime multicore initiative aims to take advantage of diverse design compute environments to deliver a scalable timing analysis solution, to accommodate current and future generation chip size increases.
No ‘One Size Fits All’
At Synopsys, after reviewing our product portfolio, it was observed that different products would benefit from different algorithmic approaches to multicore – there is no ‘one size fits all’. Each tool team picked the approach which best fit their product and sometimes used multiple approaches. In enhancing PrimeTime for multicore, we concluded that a multi-pronged approach to multicore would offer the most benefits in the broad range of multicore compute environments that our customers use. We created a solution that uses threaded and distributed capabilities and can be applied effectively on a single design.
Working closely with customers, we found that design teams have very diverse compute environments. Server farms included a mix of single-core, dual-core, quad-core and a few 16- or 32-core machines. Typically there were hundreds of small machines, tens of mid-sized machines and a few really large machines. A solution targeted at the small machines didn’t necessarily meet the needs of people who had big machines that they wanted to dedicate to signoff runs. A solution targeted at high-end standalone machines would mean that the plentiful mid and small size machines would sit idle. By enabling two multicore approaches (Figure 1), we’ve been able to deliver a solution that helps design teams whatever the makeup of their compute environments.
Figure 1. PrimeTime Multicore Solution Supports Distributed and Threaded in Tandem
Big Jobs on Small Machines
Not all designers have access to machines that are big enough to handle their biggest jobs, so they need to be able to divide the design up into partitions and run each in parallel. That’s why we offer a ‘coarse-grain’ distributed partitioning approach.
- The designer specifies the number of partitions and PrimeTime automatically:
- creates partitions ensuring that they are all approximately the same size for load balancing purposes,
- runs the partitions individually, on different cores on the same machine or on different cores on different machines. Re-combines the partitions and analyzes the timing between the partitions, and
- creates a single timing report.
Note: If all the processes finish running at the same time, the analysis of the whole chip completes faster. However if one partition takes 4 hours and the other partitions take 1 hour each, this so-called ‘long pole’ imbalance will mean that there is significantly reduced speed-up benefit from the runtime distribution. This was one of the biggest challenges for the PrimeTime team in delivering distributed timing analysis.
With the above approach, customer design teams can run distributed analysis using a master machine that has about half the memory of that required to support a scalar (single-core) analysis run. In other words, if a 64 million gate design needs access to 32 GB of memory at the peak of the analysis process, with a distributed approach the master process for the design can fit into a machine with only 16 GB of memory and the partitions can be run on other machines – also 16 GB machines, typically. Although this will use more memory overall, the advantage is that the design team can continue to use the smaller machines with 16 GB of memory that they already have in their farm, instead of buying new machines with 32 GB of memory to run timing analysis on the new design.
Make the Most of Big Machines
If designers have access to large multicore machines with lots of memory to run their signoff analysis, they can use the PrimeTime ‘fine-grain’ threaded approach to multicore analysis.
Instead of dividing the design into partitions, this involves taking a single analysis process and establishing worker threads with each assigned a unit of work to complete. Each thread is computed separately but shares the same memory space. Threaded analysis is a shared memory solution and uses memory from a single machine. In the example given earlier of the 64 million gate design with a peak memory requirement of 32 GB of memory on a single core, the required memory for threaded analysis on a 4 core machine would be approximately 35 GB of memory. For many, this small increase in memory is very worthwhile to achieve a 2x speedup in analysis runtime.
- More than Multicore
In PrimeTime we are delivering other signoff analysis technologies that complement the multicore initiative to deliver improved productivity to designers. These include:
- hierarchical analysis to manage large design complexity, providing further capacity and performance improvements over flat analysis,
- distributed multi-scenario analysis to enable design teams to run scenarios in parallel, and
- multi-scenario enabled ECO technology that allows setup and hold fixing for all design scenarios in parallel.
PrimeTime is the only EDA tool that currently offers both distributed and threaded approaches to multicore. We have invested in these technologies because they give design teams a better return on investment for the broad range of machine resources in current compute farms: if they have big machines, they can make the most of them; if they have smaller machines, they can break the design into smaller blocks and still get significant performance increases in using PrimeTime for the analysis of complex designs.
Synopsys has a multicore initiative for all the products within the Galaxy platform, including Design Compiler®, IC Compiler and PrimeTime. Overall we have seen 2x faster throughput, although this varies from tool to tool. Synopsys continues to invest in multicore development across the product range. In PrimeTime we see this rollout of threaded analysis as an ongoing process, rather than a one-off event. In the future we will be delivering further scalability in both threaded and distributed analysis as we optimize for more cores.
Bernadette (Bernie) Mortell is senior product marketing manager for PrimeTime Suite at Synopsys.
©2010 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.