SNUG Austin Abstracts 

Friday, September 28, 2012
10:30 AM - 12:30 PM
FA1 User and Tutorial Session: Physical Synthesis, PrimeTime Performance, and Constraints Analysis
Advanced Design Partitioning with IC Compiler Leveraging Physical Synthesis
Jack Randall (Advanced Micro Devices, Inc.)
This paper presents a comprehensive methodology and flow to physically partition a design into several sub-blocks starting from RTL, enable re-running physical synthesis of the sub-blocks, and re-assemble the sub-blocks in IC Compiler for high-quality results. This advanced methodology leverages the capabilities of physical synthesis using Design Compiler Graphical with SPG to explore, optimize, and guide physical partitions prior to IC Compiler’s performance of the splitting. Physical partitioning of a legalized SPG-driven synthesis design generates high-quality automatic pin placements and timing constraints for the resulting sub-blocks. The sub-blocks are further optimized by subsequent physical synthesis from RTL and re-assembled in IC Compiler to perform parasitic extraction and timing analysis re-using the original top-level constraints. Early exploration of the design and initial floorplanning are performed with Design Compiler Explorer. Design Compiler Graphical with SPG is used to drive the design through legalized and optimized placement in IC Compiler before and after partitioning. The described methodology and flow is illustrated with an experimental microprocessor core.

Performance and Productivity Improvements in PrimeTime
LaMark Chance (Synopsys, Inc.)
The performance and productivity improvements in PrimeTime 2011 releases enable users to address signoff challenges more efficiently. This tutorial will cover new capabilities, including best usage methodology to take advantage of improved runtime and ease-of-use in debugging timing issues. The topics presented are suitable for all users of PrimeTime.

Galaxy Constraints Analyzer: Comparing Multiple SDC Constraints Files
Robert Moore (Synopsys, Inc.)
Today’s SoC designs are extremely complex with tight design schedules. Any change to timing constraints can have a significant impact on timing results and time to tapeout. This tutorial will demonstrate how to quickly debug constraints changes and identify discrepancies using Galaxy Constraint Analyzer. After providing some background information on the technology, we will demonstrate how to analyze the differences between two SDC files for unintended behavior changes.
The tutorial is for design implementation engineers and managers looking for a solution to drastically reduce the time needed to provide clean constraint definitions.

FA2 User and Tutorial Session: Cortex-A15 Best Practices and Structured Design
Optimized Implementation of a Gigahertz+ ARM® Cortex™-A15 Processor using Tools included in the Galaxy Implementation Platform
Brian Millar (Samsung); Chandu Challapalli (Synopsys, Inc.)
High performance processor implementation is a very complex and challenging task. This paper will describe how tools included in the Synopsys Galaxy Implementation Platform were used for a high performance, low power implementation of processor for mobile applications. We will discuss the successful application and trade-offs of key technologies and techniques starting from synthesis to place and route that enabled the high performance. Technologies highlighted in this session include topographical synthesis, physical datapath, clock mesh, multivoltage design, and multicorner/multimode optimization, all of which were used successfully by the design team to achieve the aggressive performance/power target and an improvement of 20% over traditional implementation techniques.

Efficient Reusable Structured Design Methodology
Karthik Punukollu, Thomas Lin (Advanced Micro Devices, Inc.); Thomas Felske (Synopsys, Inc.)
This paper describes the challenges and criteria for utilizing structured placement to achieve timing and area requirements. This paper provides various IC Compiler Relative Placement methods and ideas for quickly prototyping optimal placement for datapath and regular structured designs within IC Compiler. We will cover an integrated methodology with repeatable and reusable design flow methodologies that AMD developed for placement optimization. Finally, the paper will cover newly implemented features of Streaming Relative Placement to simplify coding complicated structures.

Techniques for High Performance Cores using Synopsys Galaxy Platform—ARM® Cortex™-A15 Case Study
Chandu Challapalli (Synopsys, Inc.)
Learn how to predictably achieve high performance while minimizing power. We will present an optimized implementation methodology for an ARM Cortex™-A15 processor core based on Synopsys’ Galaxy™ Implementation platform. This session will highlight the latest technologies/techniques in Design Compiler and IC Compiler used to achieve challenging performance/power targets. These include physical guidance, delay performance vs. area tradeoffs, leakage optimization, innovative methods to reduce slack across register stages during final timing closure, and more. We will examine benefit/cost tradeoffs of each technique; performance/ease of convergence and impact on schedule/turnaround time. We will also share results obtained using this combination of optimized methodology, tools and physical IP.

FA3 User and Tutorial Session: Simulation Performance, Verification IP and X-Optimism
SoC Simulation Performance: Bottlenecks and Remedies
Patrick Hamilton, Richard Yin, Bobjee Nibhanupudi, Amol Bhinge (Freescale); Tareq Altakrouri (Synopsys, Inc.)
One of the biggest challenges that today’s next-generation complex SoC architectures pose is rigorous verification of the SoC designs that result from these architectures. Functional verification of the full-system represented by these designs of mammoth scales ( > 1 billion transistors per chip ) calls for the verification environment to employ advanced methodologies, powerful tools and techniques. Constrained-random stimuli generation, coverage-driven-completion criteria, assertions-based checking, faster triage and debug turnaround, C/C++/SystemC co-simulation, gatelevel verification, etc. are just some of these methods and techniques that contribute to tackling the challenge. These advanced techniques, however, can add an overhead to the HDL simulation performance if used incorrectly. This paper discusses the several simulation and debug bottlenecks experienced during the verification of a complex next-generation SoC and how they were understood and overcome using VCS diagnostic capabilties, profile reports, VCS arguments, tool fixes, testbench modifications, smarter utilities, fine tuning of computing resources, etc.

Accelerated SoC Verification with Synopsys Discovery VIP for the ARM AMBA 4 ACE Protocol
Chris Spear (Synopsys, Inc.)
As the complexity and number of processor cores in SoC designs increase, so do the verification challenges. One such challenge is verifying hardware based cache coherency protocols used by these multi-core SoCs.
Synopsys provides 100% SystemVerilog-based VIP that supports the ARM® AMBA® 4 AXI™ and ACE™ (AXI Coherency Extensions) protocols, as well as the UVM, VMM and OVM methodologies. Constrained-random sequences, protocol checks and coverage plans are also provided.
This tutorial describes how a reference verification platform built with the Discovery VIP for the AMBA ACE protocol can be utilized to accelerate the verification of multi-core SoCs. Also highlighted are Synopsys verification technologies like Discovery Visualization Environment (DVE) and Protocol Analyzer.
Target audience: Design and verification engineers and managers.

X-Optimism Elimination during RTL Verification
Robert Booth (Freescale); Bruce S. Greene, Arturo Salz (Synopsys, Inc.)
Verification of complex chips suffers from X-optimism issues that often conceal design bugs. The deployment of lower power techniques such as power-shutdown in today’s designs exacerbate these X-optimism issues. To address these problems we used a new simulation semantic that more accurately models non-deterministic values in logic simulation. In this paper we discuss how X-optimism can be eliminated during RTL verification. We also present results from a recent project.

Lunch and Executive Address
Physics and Economics Are Driving Silicon Convergence
Ty Garibay, VP of Engineering, Embedded Processing - Altera
As the costs of designing a new SoC increase with each new process node, it is becoming more and more difficult to justify the ROI on new market or customer specific IC's. Few product lines targeting narrow markets will be able to make the jump to 28nm, and only a handful will be able to generate profits at 20nm and beyond. However, the market need for specialization and product differentiation is not going away. By combining the common subset of most ASIC's and ASSP's with our industry-leading programmable logic technology, Altera's SoC FPGA's create a new alternative for market specific solutions.
The convergence of CPU, accelerators, high-bandwidth memory interfaces and programmable logic will enable silicon to continue to address the needs of a wide range of customers in the future.

Friday, September 28, 2012
1:30 PM - 3:00 PM
FB1 User and Tutorial Session: Leakage Reduction and Processor Design
Solving for Leakage Power and Timing by Vt Swaps in PT-SI
Chakradhar Tallury (Advanced Micro Devices, Inc.); Karthikeyan Karunanidhi (Open-Silicon)
There are many ways in physical design to reduce the leakage power of the design. Increased demand for higher performance, ever increasing features, and longer battery life have challegenged designer to use a wide range of techniques in power saving.
In this paper we present a fast and novel way of reducing leakage power in the design without violating slacks in the timing paths. We can even solve timing paths by using this utility to swap cells to higher lvt. Currently other hvt/lvt swap ratio do not achieve the power/timing needed due to over constraining and miscorrelation between PD and PT. We compare this utility to fix_eco_timing from PT, leakage optimizations from ICC and third party tools. We will provide our algorithm, and show data that compares the efficiency of swaps and the subsequent timing profile with other industry standard approaches listed above.

Designing Programmable Hardware Accelerators: Gaining Flexibility Without Compromising Power, Area and Performance
Drew Taussig (Synopsys, Inc.)
Is it possible to design programmable hardware accelerators that are flexible enough to deal with multiple standards and different use cases, while still meeting power, performance and area constraints? With Synopsys Processor Designer (PD), you can. In this session we will show how, with PD, designers can quickly and easily develop and verify programmable hardware. We will cover how PD automates the creation of optimized RTL code, software tools (assembler, linker, compiler and debugger), system model, and verification model from a single input specification. This tutorial is intended for design engineers, engineering managers, and chip architects.

FB2 User and Tutorial Session: Cortex-A15 Best Practices and 20nm Design
High Performance Physical Design of a 28nm Quad-Core ARM Cortex-A15 with 4 MB L2 Cache
Jason Karka, Michael Robinson (Texas Instruments); Bill Sicaras (Synopsys, Inc.)
This paper highlights key strategies used in a place and route flow for a 28nm quad-core ARM Cortex-A15 processor. Various physical design techniques were used to obtain very high clock frequencies. Many of these tactics will be of use not only to designers implementing quad-core A15 processors, but also to those designing other 28nm high performance chips. The following is an outline of topics discussed:
  • ARM Cortex-A15 Processor Overview
  • Tapered Metal Stack
  • Net Patterning
  • NDR and Layer Assignment
  • Hierarchical Partitioning
  • Placement Density and Clustering
  • Clockgate Cloning
  • Useful Skew
  • Logic-Level-Balanced Clock Tree Synthesis
  • Post-Route Setup and Hold Closure

20nm Double Pattern Technology in IC Compiler
Zugang Li (Synopsys, Inc.)
Advanced technology nodes present a whole new set of design challenges in achieving Place and Route closure. In this session we will take a look at some of these challenges such as advanced technology design rules (DRC), double pattern technology (DPT), optimal standard cell library layout and demonstrate how ICC will help you achieve design closure.

FB3 User and Tutorial Session: Low power Verification, X-Propagation and Testbench Timing
Verifying a Low Power Design
Asif Jafri (Verilab)
User expectations of mobile devices drive an endless race for improvements in both performance and battery life. This paper outlines the verification challenges created by some widely used low power design techniques, and shows how a digital verification methodology can be extended to existing testbenches, enabling low-power designs to be verified.
We describe our experience in using the Unified Power Format to define various power domains and isolation policies, and to control power states from the testbench.
We will also discuss how to structure your verification effort, starting by capturing key low-power features in a testplan, and using appropriate tools and methodology to satisfy your testplan's requirements so that you can confidently claim that the design has been verified.
This paper will be useful to verification engineers who have begun a challenging low-power de-sign verification project and who wish to apply best-practice techniques that have already shown their practical value.

Getting X-Propogation Under Control
Bruce Greene (Synopsys, Inc.)
The X-optimism semantics of standard RTL simulation can lead to incorrect behavior which often conceals design bugs. These bugs lead to passing simulations and creating problems that are difficult to correct later in the flow.
This tutorial explores a new method to address this problem that changes the X semantics in order to remove the incorrect results dues to X-optimism. Target audience: Design & verification engineers and managers.

Taming Testbench Timing: Time's Up for Clocking Block Confusion
Jonathan Bromley, Kevin Johnston (Verilab)
The clocking block feature was designed to provide SystemVerilog verification environments with a versatile and well-structured way to access synchronous signals in a DUT or test harness. In practice, though, the use of clocking blocks has proved to be surprisingly error-prone, despite nearly a decade of application experience since they were first standardized.
This paper reviews the key features and purpose of clocking blocks and then examines why they continue to be a source of confusion and unexpected behavior for many verification engineers. Drawing from the authors’ project and mentoring experience, it highlights typical usage errors and how to avoid them. We clarify the internal behavior of clocking blocks to help engineers understand the reasons behind common problems, and show best-practice techniques that allow clocking blocks to be used productively and with confidence. Finally, we consider some areas that may cause portability problems and indicate how to avoid them.

Friday, September 28, 2012
3:15 PM - 5:15 PM
FC1 User and Tutorial Session: Test Methodology
Multi-Scan Compression Support in an AMD Core
Thomas Clouqueur, Martin Amodeo, Pankaj Sharma (AMD); Tim Yuan, Lori Schramm (Synopsys, Inc.)
The ability of an IP to support multiple scan compression schemes provides the flexibility of generating compressed patterns with different ATPG tools. This paper describes how AMD added support for DFTMAX compression for an x86 core while continuing support of other scan compression schemes. The architecture was designed to integrate the compression schemes to share as much logic as possible without compromising on the compression performance.
This paper describes what type of DFTMAX compression was chosen to best fit the core’s scan architecture, and how the compression gets implemented using AMD’s proprietary VICE [1] scripts. Problems encountered in this work include LSSD pipestaging unsupported by TetraMAX and general ATPG flow set-up, for which Synopsys provided tool enhancements and support described throughout the paper.

Unified DFT Clock Architecture for Single Pass Timing Closure, Single Pass ATPG, Interface Characterization and Power
Christopher Ryan, Kris Monsen, Scott Smith, Henry So - (Maxim Integrated Products)
To address timing closure and power problems for DFT, a unified DFT clock architecture is examined. Results show that ATPG capture clock timing closure is not necessary, bridging/transition/stuck-at fault ATPG can be performed with a single pass, interface characterization is now able to be completed with ATPG, and test power is programmable. Current implementations include 65nm and 40nm Maxim IC’s consisting of four to forty-six clock domains, one to eight PLL's, and several external interfaces.

Test Updates, Yield Improvement, and the Influence of Standards
Adam Cron (Synopsys, Inc.)
This tutorial will provide the latest updates to the Synopsys synthesis-based test solution: DFTMAX Compression and TetraMAX ATPG for comprehensive, high quality manufacturing test, STAR Memory System for memory test and repair, DesignWare IP BIST for high-speed I/O, and Yield Explorer for yield analysis. In addition, standards that allow easier block connectivity in the implementation phase will be discussed along with new developments in the testing of 3D integrated systems.

FC2 User and Tutorial Session: ICC Design Flows and Integrity/EM/IR Analysis
Clock Enable Timing Closure Methodology
Harish Dangat, Senthilkumar Murugesan - (Samsung); Susheel Sharma - (Synopsys, Inc.)
The clock gating is important part of low power design. This saves power due to switching of clock buffers and CK input of flops that are not active in current mode or operations. There are two type of clock gating, functional clock gating and power-compiler (or other CAD tool) inserted clock gating. The functional clock gating consists of understanding design and turning off section off design that is not operating in current mode of operation. The power compiler can analyze logic and insert clock gating logic to turn off clock to a group of flops. Since clock tree consumes 30% to 40% of the total chip power, both methods are used to save power in low power design.
However the clock-gating elements have to meet the setup time. Since these elements are in the middle of clock-tree or clock-path, clock reaches earlier to flip that generates clock enable signal. This makes it difficult to meet clock enable timing. This paper describes methodology used to fix these violations.

Accelerating PG Design Closure in IC Compiler with the Latest 2012.06 PrimeRail In-Design Rail Analysis
Jason Binney (Synopsys, Inc.)
With the recent release of IC Compiler (2012.06), Synopsys has enhanced the In-Design Static Rail Analysis to increase performance using multicore techniques by 3x and improve error detection using powerful network integrity metrics from within the IC Compiler place and route flow. Complementary to the existing Power Network Synthesis (TPNS) flow, In-Design Rail Integrity Analysis provides powerful, customizable features to identify missing VIAS and floating shapes, as well as sort these using rail integrity error collections. With the same look and feel as IC Compiler, come and hear how to use PrimeRail In-Design Rail Analysis to identify problems early in the design cycle to achieve faster design closure.

Qualcomm DSP Semi-Custom Design Flow: Leveraging Place and Route Tools in Custom Circuit Design
Nadeem Eleyan, Patrick Szabo, Ken Lin, Paul Bassett, Masud Kamal (Qualcomm), Frank Gover (Synopsys, Inc.)
There are generally two options available to Integrated Circuit (IC) designers to physically im-plement their designs: Synthesis / Place and route design and Custom Circuit design. Each de-sign approach has its advantages and draw backs. This paper will cover a hybrid design flow using concepts from both areas to give us a quick design turn around time while allowing control on custom placement and routing.

FC3 User & Tutorial Session: Covergroups, Functional Coverage and Low Power Verification
Using Covergroups and Covergroup Filters for Effective Functional Coverage
Hillel Miller (Freescale)
Covergroups are one of the key tools provided in the p1800 system verilog for doing functional coverage. Generalizing a bit, with Covergroups tables are defined to specify the target coverage goals. The tables specify a multidimensional range of possibilities that need to get covered by the test-bench. This range of goals can grow out of control if the verification engineer does not care-fully select a subset of goals that makes sense from a verification perspective. The chosen goals need to be just enough to provide the desired quality requirements. In a sense this is an art, in this paper we will not discuss this art, but we will discuss the tools available for carving out the coverage goals. In the upcoming release of p1800-2012 new constructs are provided just for doing this. The construct we will focus on is the "with" construct. The new construct provides the ability to carve out of the multidimensional range of possibilities a sub-range of goals that works well in a “working environment” that requires frequent reprioritization to meet tapeout goals.

Debug of Low Power Designs with Discovery Visualization Environment (DVE)
Tom Powell (Synopsys, Inc.)
As more SoC’s incorporate low power techniques, the debug of these designs becomes more challenging. The ability to correlate power domains and low power structures to the design for debugging functional failures is essential. This tutorial will give users a snapshot of the Low Power debug features available in DVE, accompanied by screen shots and a discussion of the features that will improve the overall experience of debugging Low Power designs.
Target audience: Design and verification engineers and managers.

100% Functional Coverage-Driven Verification Flow
Thinh Ngo, Sakar Jain (Freescale)
Coverage-driven verification has been used effectively to speed up coverage closure. However, attaining 100% functional coverage is still a challenging and time-consuming task. We propose a coverage driven verification flow that can efficiently achieve 100% functional coverage. The flow targets each functionality, focuses at transaction level, measures coverage during simulation, and fails a test if 100% of the expected coverage is not achieved. This flow maps stimulus coverage to functional coverage, with every stimulus transaction being associated with an event in the coverage model and vice versa. This association is derived from the DUT specification and/or the DUT model. Expected events generated along with stimulus transactions are compared against actual events triggered in the DUT. The comparison results are used to fail test. 100% functional coverage is achieved via 100% stimulus coverage. The flow enables every test with its targeted functionality verification to meet 100% functional coverage provided that it passes.