Fast FPGA Turnaround
Designing big FPGAs on tight deadlines and budgets, whether for prototyping or use in end products, is making designers think seriously about ways to improve productivity. Angela Sutton and Jeff Garrison, both Synopsys, look at ways to improve turnaround time for large FPGA designs and how new features in the latest release of Synopsys’ Synplify FPGA tools can help.
FPGAs double in capacity approximately every two years. Designing for the biggest FPGA today is equivalent to developing a 5 million gate ASIC – a sizeable project by anyone’s standard. Consequently, tool runtimes for “flat” designs have ballooned into anything from multiple hours to several days, which can make it harder for design teams to incorporate bug fixes, tweak their designs, and get feedback on how well their design works within a working day.
At the same time, ASIC designers increasingly care about the performance of FPGA design tools. Synopsys estimates that well over 70% of designers depend on FPGA-based prototyping, and most of them use the largest FPGAs currently available on the market. The reason that FPGA-based prototyping is so popular is that it enables designers to verify proposed design changes quickly and easily.
Designers use FPGAs in production systems because they are economical, reduce risk and allow the design to be easily adapted to specification changes. Traditionally, FPGAs have been quick to design and cheap and easy to change. If FPGA design flows don’t keep up with FPGA design complexity, the industry stands to lose out on the benefits that FPGAs offer.
Fortunately there are several ways to improve turnaround time for FPGAs. Some of these are traditional techniques, while others are new and depend on having innovative features in FPGA design tools. Some speedup techniques have a detrimental effect on design performance and area – or quality of results (QoR) – while others maintain or improve it.
How to run faster
The available options to speed up your FPGA design project differ depending on your priorities. Additional considerations and opportunities exist to assist designers using FPGA for ASIC prototyping where there may be less emphasis on QoR and great emphasis on fast debug iterations. If you can develop blocks in parallel, this will open up more ways to go faster. Table 1 summarizes the options for each of these aims and constraints.
Table 1: Traditional and new techniques for fast FPGA implementation
Server Farms and Hierarchical Design
Having a server farm at your disposal enables you to run several jobs in parallel with slight variations in design settings and then choose the best results. This can deliver good results and is easy to automate using scripts once the design has been debugged and runs through the flow without problems. Hierarchical portions of the design can be farmed out to different servers for parallel synthesis and the results merged.
It’s possible to partition the design into blocks and refine them in parallel on one or more machine or processor. You can use block-based design to isolate parts of the design that are changing from those that are not, or to isolate design IP and its interfaces within the design. Block-based design can be faster because the design team can work on the partitions in parallel, and need only re-synthesize the blocks that change from one run to the next. You can preserve blocks that are finished and verified, which makes the results more stable and saves verification time.
Block-based design does have some disadvantages. The design tools may not be able to optimize the design across the partition boundaries, which can result in worse QoR. To get the QoR you want, you may have to run optimizations for longer. For these reasons, you may want to partition your design so that it keeps performance-critical parts of the design inside the same block.
Faster Place and Route
Place and route (P&R) typically takes over half of the overall design iteration time for a multimillion-gate FPGA project, so it’s worth doing what you can to speed up this part of the design process. Some FPGA tools have fast or low-effort modes, which sacrifice QoR for shorter runtimes. These are useful if you want to quickly iterate the design to check a small change, or produce an initial implementation of your prototype.
Some tools support incremental P&R, which, for the second and subsequent iterations of P&R, endeavor to replace only those parts of the design where the netlist changed, assuming that the P&R tool can still meet the design’s timing. Using incremental P&R can save you half the P&R runtime overall.
This approach is most useful in handling minor design changes that aren’t on a critical timing path. It is automated so it’s easy to use, and you don’t need to change your synthesis or backend methodology. Typically you can handle around 10 incremental updates – more than that and it is better to invoke your P&R run from scratch.
This approach may save less time when any incremental changes affect a critical path because the P&R tools have to change large parts of the design. It’s also of less benefit for designs that utilize a high proportion of the available gates, because even small changes will have a greater effect on the design.
Multiprocessing uses two or more processors in order to synthesize designated design blocks in parallel and is a basis for runtime savings in block-based flows. Whether you let the tools create design blocks using automatic “compile points”, or choose to manually insert compile points yourself, multiprocessing can reduce synthesis runtime by up to 30% for a minor reduction in QoR. You can also combine automatic compile points with fast synthesis modes (see below) for further runtime speedup.
Some backend FPGA tools also offer multiprocessing for P&R, which reduces runtime by taking advantage of having multiple processors to run portions of the design in parallel. Using multiprocessing can trim P&R runtimes by 10% or more, but may also reduce QoR.
Fast synthesis reduces runtime by disabling certain optimizations. The penalty for faster synthesis may be slower clock speeds and more area, but this may not matter, especially if you simply wish to get fast feedback on your initial design RTL and project setup
You can use fast synthesis mode in different ways depending on your design goals. If you want to tune RTL constraints, you can use fast synthesis mode to perform synthesis-only iterations with your normal tight constraints.
If your aim is to get a design onto a board as quickly as possible for debug or prototyping, assuming that you don’t need high speed, you can run synthesis and P&R with loose timing constraints. This approach will reduce the time it takes to go from RTL to bitfile by about a quarter.
Handling Synthesis Errors
It doesn’t always make sense to stop a synthesis run just because it encounters errors. It’s going to take a long time to discover all of the errors in your design if there are hundreds of them and synthesis comes to a halt once it has encountered just a few. If you can tell the synthesis tool to keep going on the error-free portions of the design, regardless of other errors encountered, so-called “continue synthesis upon error” will enable you to fix more errors at once without having to re-start synthesis multiple times.
What's New in Synplify Pro and Synplify Premier?
Many of the features included in the latest release of Synplify Pro and Synplify Premier support faster FPGA turnaround times by improving productivity. Other features focus on improving quality of results.
Enhancements in Synplify Premier
Runtime Improvements: Fast Mode, Multiprocessing, Automatic Compile Points
Design teams who want to use the latest FPGAs may face synthesis runtimes of 8-10 hours on a flattened design. The improvements made to Synplify Premier’s Fast synthesis mode can reduce runtimes by 4x compared to standard logic synthesis (see Figure 1).
Synplify Pro or Synplify Premier users can designate RTL partitions or blocks – so called “compile points” – or have the tool create “automatic compile points”, which it will then synthesize in parallel using multiprocessing for runtime speedup. Synplify Premier users can further boost runtime by switching on fast synthesis mode when running this flow.
Synplify Premier automatically propagates the design’s interface timing constraints from the top-level to each automatic compile point to ensure that timing goals are met.
Figure 1: The Fast synthesis flow in Synplify Premier includes “Fast Mode”, “Automatic Compile Points (ACP) with Multiprocessing” and the ability to "Continue Synthesis upon Error"
Physical Synthesis: Synplify Global Placer
When a design meets timing in synthesis but fails to meet timing in the backend tools, the problem is due to poor timing correlation between the two environments. Synthesis tools typically use estimates for interconnect. If those estimates are poor it is very difficult to close timing. To get more accurate estimates of FPGA interconnect, which is the dominant delay in FPGAs, Synplify Premier uses physical synthesis. That means synthesizing the logic and performing placement to build an accurate interconnect model, before forward annotating placement constraints to the vendor’s router. Synplify Global Placer gives those using Xilinx FPGAs an alternative to Xilinx’s placement tool.
Physical synthesis helps design teams implement aggressive timing closure by using the full power of Synplify Premier: logic synthesis, placement and physical synthesis in a unified environment.
Physical Synthesis: Physical Accelerator Flow
The Synplify Premier physical accelerator flow can help designers meet timing on fast designs without having to spend a lot of time constraining them. After creating an initial implementation of the design, engineers can take this physical accelerator “booster flow” and apply it to an existing netlist and P&R database to improve QoR using the full power of physical optimization. It’s easy to setup the physical accelerator flow because it runs on an existing P&R database using existing P&R constraints generated or applied during the first pass through place and route.
Power is becoming a more important issue for designers targeting large FPGAs. This is primarily because of the problems associated with heat dissipation.
Synplify Premier now includes features that can help design teams to manage dynamic power by analyzing switching activity and, based on that activity, perform targeted power optimizations. The switching analysis can highlight high levels of activity that the design team may be able to reduce. Automatic power optimizations are available for Xilinx RAM and DSP blocks. The optimizations can choose smaller blocks and power down bits when DSP and RAM blocks are not being accessed.
The return on investment from FPGA prototyping can be diminished if significant re-design is required to ready the FPGA RTL for the ASIC flow. By providing support and integration with the Synopsys DesignWare Building Blocks library of digital building block components, Synplify Premier helps designers to improve consistency between their ASIC and FPGA flows. Each release of the DesignWare Building Blocks libraries is fully tested with the ASIC and FPGA flows. In addition, users can synthesize into FPGAs and verify DesignWare cores that have been configured to meet your specific needs, using the Synopsys coreTools.
Synplify Premier now includes the netlist editing feature, previously only available in the Certify product. This addition allows designers to automatically modify their netlists when preparing a pre-existing ASIC design for FPGA. This capability can, for example, be used when prototyping an ASIC in an FPGA to systematically remove built-in ASIC test logic that is not applicable to the FPGA.
Enhancements in Synplify Pro and Synplify Premier
Design Preservation: Block-based Design Flows
Having the ability to manually insert RTL partitions (compile points) is useful when different teams of engineers are working on different parts of the design in a “divide and conquer” approach. It also allows engineers to lock down areas that they have completed and verified, and to isolate IP when they wish to maintain port names in order to more easily apply constraints.
In Synplify, you can define RTL partitions, or locked compile points (CPs), prior to synthesis, and maintain these partitions throughout synthesis and P&R. The jointly developed Xilinx design preservation flow is a good example of this. It maintains partitions by having the synthesis and P&R tools automatically detect which blocks have changed from one run to the next, and only re-implementing the changed blocks. This approach saves runtime because it only needs to re-implement a small portion of the design.
Hierarchical Design – Mixed Bottom-up and Top-down Flows
One of the consequences of having bigger FPGAs is bigger, often geographically dispersed, design teams. Synplify Pro and Synplify Premier now include support for hierarchical and geographically distributed design. A range of bottom-up, top-down and a hybrid of both design flows are now supported, providing a supported environment that can replace what many design teams are doing today (manually) through the use of scripts (Figure 2).
The hierarchical design environment enables design teams to develop blocks in parallel on multiple servers, synchronize and integrate changes and re-use design modules more easily. Design teams can export design modules as self-contained sub-projects that include everything needed to re-use the block, including TCL scripts, synthesized files and reports.
A major advantage of this flow is that it doesn’t require RTL floorplanning.
Figure 2: Hierarchical Design Interface in Synplify Premier and Synplify Pro
Enhanced Language Support
Enhanced support for both VHDL 2008 and SystemVerilog improves the range of constructs and language features that both Synplify Pro and Synplify Premier offer.
New Device Support
As always, the latest release of the Synopsys Pro and Synplify Premier synthesis products continue to be enhanced to support new FPGA devices, concurrent with availability from FPGA vendors. Examples of recent additions include Synplify Altera’s Stratix V device family and SiliconBlue’s iCE65 device family.
Designers can now automatically import and convert project and constraints information from Altera and Xilinx for Synplify Premier and Synplify Pro using updated ise2syn and qsf2syn design import capabilities.
Table 2 summarizes recent new capabilities provided in Synplify Premier and Synplify Pro.
Table 2. What’s new in Synplify Pro and Synplify Premier?
About the authors
Angela Sutton brings over 20 years of experience in the field of semiconductor and semiconductor design tools to her role as staff product marketing manager for Synopsys, Inc. She is responsible for the FPGA Implementation Product Line. Prior to joining Synopsys, Ms. Sutton worked as senior product marketing manager in charge of FPGA implementation tools at Synplicity, Inc., which was acquired by Synopsys in May 2008. Ms. Sutton has also held various business development, marketing and engineering positions at Cadence, Mentor Graphics and LSI Logic. At LSI Logic she was responsible for marketing its line of digital video semiconductor products and platforms.
Ms. Sutton holds a BSc. in Applied Physics from Durham University, UK, and a PhD. in Engineering from Aberdeen University, UK.
Jeff Garrison brings more than 20 years of experience in marketing and software engineering to his role as director of product marketing for FPGA implementation at Synopsys, Inc. His responsibilities include product strategy, definition, and launch for Synopsys’ FPGA products including Synplify, Synplify Pro and Synplify Premier.
Prior to joining Synopsys, Mr. Garrison worked as senior director of product marketing at Synplicity, Inc., which was acquired by Synopsys in May 2008. Mr. Garrison also held positions as a senior product marketing manager for several IC design products at Cadence Design Systems, product engineering and technical support for VLSI Technology and worked in the software support division of Hewlett Packard. Mr. Garrison holds a bachelors degree in computer science from Indiana University.
©2010 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.