| Technology Update|
Optimizing Infrastructure to Get the Most from EDA
EDA demand on compute infrastructures is growing. Glenn Newell, Synopsys, looks at some of the ways that businesses can optimize their IT for EDA applications.
Demand for compute cycles is increasing exponentially as we move to smaller semiconductor process nodes. Forecasts suggest that IT costs could spiral out of control in order to keep up with demand, while the reality is that budgets are likely to remain fixed. Unfortunately, multicore and other processor advances are not keeping up with the continued demand for faster processing.
Some design teams are considering alternatives to making long-term investments in fixed IT equipment. An attractive option is to offload processing to more flexible and temporary resources that they can use as and when needed. This approach reduces the overall cost of design and makes more budget available for engineering and CAD resources, as well as EDA tools.
Some design teams have attempted to make more of their budgets by not investing in the latest network equipment; however, this can lead to numerous bottlenecks that, rather than saving money, increase cost and reduce performance. The efficiency of the infrastructure is critical in meeting the increase in compute demand from EDA tools.
Within Synopsys, we have significant experience in optimizing our own compute environments for EDA. We have also worked with some of our leading customers to help them get more from their IT investments. Our latest thinking and some best practice is summarized below.
Our customers are particularly interested in how cloud computing can help reduce costs. By having access to a complete EDA compute infrastructure over the internet, they can access hardware and software resources as a service.
In theory, cloud computing should offer global tool access to the project team regardless of its location. However, many IT support specialists are apprehensive about adopting cloud computing – they are concerned about security, data transfer, losing control, and where to store the design data. Many of our clients believe their data and computing needs are unique, and feel that a standard configuration won’t meet their requirements.
- To create a best-in-class EDA cloud computing infrastructure, three fundamental properties need to be evident within the infrastructure:
- appropriate performance,
- high degrees of flexibility, and
- optimum cost efficiency.
There are many industries that already provide their customers with software as a service using cloud computing business models. As a result, the term ‘cloud computing’ has a wide (and sometimes confusing) meaning. We are focusing upon clarifying what ‘cloud computing’ means to Synopsys and its customers – we are tracking trends in scientific cloud computing and educating cloud providers on the specific needs of EDA.
Increasing Storage Bandwidth
EDA poses special challenges to network-attached storage (NAS) because the different tools used at the different stages of chip design exhibit the full range of storage access patterns, from small block asynchronous to large block synchronous. Storage systems designed for high-performance computing (HPC) can handle the large block synchronous (parallel file systems); however EDA tools also present a heavy metadata load (file and directory attributes vs. reads and writes), which is proving to be a challenge for NAS storage vendors.
As the volume of EDA data grows with shrinking process nodes and larger designs, network efficiency becomes a key factor in distinguishing ‘best in class’ from ‘worst in class’ solutions. So it should come as no surprise that IT system architects are always looking for better ways to improve network efficiency by reducing blocking factors and increasing bandwidth between processors and storage.
A typical blade server has an internal switch and blocking factor of 14:1. This means that the blade center has 1 GB to serve all 14 blades. However, once it reaches 14 blades, the switch creates a bottleneck and the amount of traffic that it can communicate with starts to fall off.
Figure 1: Blocking vs. non-blocking architectures
In contrast, a non-blocking environment may have over 300 non-blocking ports. It overcomes the limitations of traditional blade servers by enabling users to reach over 300 nodes before the throughput starts to fall off dramatically (Figure 1). While most EDA tools do not require a low blocking factor between compute nodes, a high bandwidth, low blocking factor path to NAS storage is critical.
Measurements have shown that, when compared to older switches, non-blocking switches give a 10x increase in utilized uplink bandwidth per node. Therefore it’s important to invest in a solid architecture to optimize the overall efficiency of the infrastructure.
Bottlenecks increase turnaround time and waste computing slots. In EDA IT infrastructure, they tend to move around so that as soon an IT or design team has identified and dealt with one, another will turn up elsewhere. It’s a never-ending task but IT support specialists need to keep looking at the whole infrastructure and review parameters such as IO requests, OS and file system performance, CPU/cache utilization, memory, network card performance, BIOS settings, and so on.
In order to efficiently manage the bottlenecks, infrastructure monitoring for the next level of EDA has to become more intelligent. It needs to combine IT, business and EDA monitoring data.
Using scientific data visualization techniques, such as tree maps, is one way that we can make sense of an otherwise confusing mass of data and help users to better understand the performance of their infrastructures. By providing information intelligence we can show the correlation between IT and EDA tool metrics. This approach enables users to relate EDA application performance with what’s happening in the IT environment.
The highest bandwidth into the human brain is via the optical nerve, so modeling the environment in this way is the most effective method of helping users understand and interpret the data. Design teams can quickly and easily see when parameters are reaching their limits, and can predict where bottlenecks are likely to appear. They can then take preventative measures to ensure these bottlenecks don’t occur and use their infrastructure more efficiently.
- Business Rules Monitoring
Conventional compute environments do exactly as they are told and cannot tell when people are misusing or abusing them. Unfortunately, we cannot always rely on users of compute farm resources to be disciplined in the way they use them. For that reason we have developed a system that tracks:
- the notification of CPU usage,
- long-running and pending jobs,
- memory use,
- violations, and
- direct login to compute farm machines.
Business rules monitoring (BRM) does this by allowing IT support specialists and CAD managers to set up business rules within the compute farms. For example, user A should not use more than 10% memory and user B can only use 5% of the compute farm. Then, depending on how it has been set up, if the user breaks the business rule, the tool sends warnings to notify users of the breach and/or terminates jobs.
Using BRM will help optimize the infrastructure by ensuring that the compute farm is being used properly and not wasting key resources.
Priority-based allocation is about ensuring projects on a critical path have access to the resources they need. When multiple projects are competing for the same batch node of compute resource, IT support specialists need to ensure the projects have sufficient resource to meet milestones and commitment dates.
Traditionally, IT support specialists and CAD managers have divided up the compute farm and allocated the resources equally between users. However, when the resource was not in constant use, it was wasted and the overall throughput decreased.
Priority-based allocation gives a guaranteed minimum resource. This means that when a project needs the resource, it can access the tool instantly, but other users can access it when it’s not in use.
Allocating resources in this way reduces queues and wasted core cycles, and enables greater visibility into resource allocation per project. Furthermore, it allows the design team to have fewer, larger compute farms with guaranteed accessibility and ensures the process is more efficient overall.
EDA is a specialized workflow and very different from what a typical IT department is used to dealing with. Synopsys’ expertise is closely allied with EDA compute problems and enables us to solve issues and optimize infrastructures, making them run as efficiently as possible.
Synopsys’ IT team supports over 60 offices worldwide. We have 16 data centers, operate 11,000 compute servers with more than three petabytes of NFS storage, and over 50 compute farms comprised of around 7,000 machines.
Our solutions deliver IP for remote access, intelligence monitoring, data center management, grid system and environment modules.
Glenn is a ten-year veteran at Synopsys where he is a principal engineer and IT architect. His responsibilities include high performance computing, new technologies, and unified communications.
He graduated from Cal Poly, San Luis Obispo, CA, in 1987 with a B.S. Degree in Electronics.
In 2006, Glenn and the Synopsys IT team leveraged their research in high performance computing to get Synopsys in the Top 500 Supercomputer list, at # 242 with 3.782 TeraFlops. They applied this same knowledge and infrastructure to run benchmarks and recommend customer hardware configurations for Synopsys DFM tools.
Prior to Synopsys, Glenn held various roles at National Semiconductor, including test engineer, ECAD manager for four product lines, intranet manager, and architect.
©2010 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.