| Technology Update|
Using MATLAB and High-Level Synthesis for DSP Implementation
Increasingly, design teams are looking to hardware to implement better performance and lower power DSP algorithms. Chris Eddington, Product Marketing Director at Synopsys, describes Synphony HLS – a new high-level synthesis tool that can target ASIC and FPGA for both production designs and virtual prototyping.
There is no question that embedded software has many compelling benefits for the chip industry. It is possible to adapt software for derivative products and upgrade it to solve bugs. Design teams like software because it reduces development risk. In fact, the industry has done such a good job of talking up the shift to embedded software that another important design trend has slipped under the radar: the rapidly increasing need for dedicated hardware engines in chip design.
Designers know that for many high-speed and compute-intensive DSP applications like video, WiMAX MIMO technology, OFDM and error correction, they have little choice but to use dedicated hardware to achieve the performance they require. Some applications don’t necessarily need a dedicated hardware engine for performance, but designers are nevertheless considering hardware in order to achieve the lowest-power design. In either case, the design challenge is to map the algorithm to an optimal DSP architecture both quickly and efficiently.
Traditional Routes to DSP Architectures
For many years, the traditional path from DSP concept to implementation has been for system designers to model the algorithm using a high-level language and hand it off for the design team to figure out the best architecture. The design team then verifies its RTL description against the algorithm specification before implementing the chip using logic synthesis, optimization and layout tools.
There are obvious problems with this approach. For one thing, it requires multiple re-coding and re-verification steps involving manual effort and the potential introduction of errors in translation. Algorithm specialists prefer to develop a floating-point model first to explore and validate the basic algorithm in full precision. Once the algorithm concept is working, they will develop a fixed-point model then choose and validate word length and precision. Then, the design team will choose the architecture and start RTL coding with the target technology in mind. Furthermore, prototyping is often required for high performance system-level validation of the algorithm implementation. This can mean even more re-coding, re-verification and a different type of expertise to optimize and map the design into an FPGA. Each of these steps is time-consuming and error-prone, leading to months of time and effort to get from algorithm concept into prototype and implementation (Figure 1). This means that verification and validation happen very late in the design cycle.
Figure 1. Traditional Flow from DSP Concept to Implementation
Higher Abstraction with MATLAB
Increasingly, DSP architects use the MATLAB® environment for early high level floating point algorithm exploration, analysis, and specification. The MathWorks MATLAB high-level language and interactive environment enables engineers to describe complex systems quickly and concisely, then analyze, visualize, and verify their operation using interactive tools and command-line functions.
When designers use MATLAB with the Simulink® environment, they can perform fast, efficient simulation for both floating- and fixed-point designs and also handle multi-rate discrete time issues. The sophisticated visualization and analysis features have made MATLAB and Simulink the tools of choice for an increasing number of DSP algorithm designers.
Because of MATLAB’s widespread use as a precursor to chip design, some EDA and chip vendors have made various attempts to automate the creation of RTL from the MATLAB environment. Often, the proposed solutions have drawbacks, which is why many designers still choose to design the architecture and write the RTL manually.
One way to get from MATLAB to chip implementation is to use IP instantiation and netlisting. This requires a chip or FPGA vendor to supply matching libraries of highly parameterized IP models – one for Simulink and a corresponding library for the target technology. Each model represents a DSP operation, such as an FFT or FIR function. Once the design team has captured and proven the algorithm in Simulink, it can quickly and easily write out a netlist for the target technology.
There are drawbacks to this approach. First, the design is far from portable. In fact, the DSP architects have to work at a relatively low-level library and make decisions that would normally be the remit of the hardware design team – for example, specifying details like how to build a delay line (RAM or registers), and how much latency it should have. This goes against one of the principal aims of DSP architects in working with MATLAB’s ‘M’ language, which is to explore algorithms at a high level of abstraction.
Synphony HLS Key Technologies
The IP instantiation technique described above is really just netlist translation. Synopsys’ Synphony HLS is a true high-level synthesis solution for MATLAB users working with DSP chip applications. It produces optimized RTL from a single high-level source that designers can target to multiple ASIC and FPGA technologies – for production or rapid prototyping and at-speed validation. It lets designers quickly and easily explore different implementation architectures, synthesize an optimal architecture including control circuitry, and create the design implementation. Synphony HLS also generates C-models that let designers quickly validate the overall system, and make an early start on developing software (Figure 2).
Figure 2. Synphony High Level Synthesis Flow
Synphony HLS provides a fast and efficient way for designers to derive fixed-point models from floating-point descriptions in MATLAB. It provides a rule-based, fixed-point propagation flow that allows designers to generate, explore and integrate M-code functions within Synphony HLS models. Designers can continue to work at a high level of abstraction and debug the models in the Simulink environment.
Mixed Design Descriptions
Synphony HLS offers a mix of language and model-based design in one environment, which allows engineers to specify and partition complex behavior with multiple sample rates, interfaces and functional boundaries.
To support model-based design, the Simulink IP block library within Synphony HLS includes common math and multi-rate signal processing functions for wireless, telecommunications and multimedia applications. Synphony HLS automatically selects the parameterized blocks during high level synthesis to produce an optimized architecture that meets the timing and area constraints.
At the algorithm level, the IP block library requires the DSP engineers to specify only high-level parameters such as filter coefficients and gain requirements. As such, the Simulink model does not constrain the implementation, and so provides an appropriate hand-off point to the hardware design team. Debug features are built into the models, so that verification engineers can easily log, override or clock signals for debugging and analysis.
Support for Multi-Rate Design
Support for multi-rate design is a common requirement for many high-performance algorithms. Typically the DSP engineer will analyze the algorithm and decide where it is necessary to change the sample rate. The IP library includes blocks for sample-rate conversion, which the DSP expert can instantiate and parameterize so that there is no ambiguity when the hardware team takes the design through to implementation.
The choice of multi-rate clocking strategies has a significant impact on power consumption. Synphony HLS can auto-generate clock domains to support different clocking strategies, which allows the design team to explore this area of the design thoroughly, knowing that they can implement the clocking scheme quickly and without error.
The hardware design team takes the Simulink model and specifies the target technology, the desired sample rates and speed requirements. The high-level synthesis tool evaluates a number of different solutions before creating RTL based on the timing and area constraints.
Synphony HLS uses advanced system-level optimization techniques such as retiming, resource allocation and sharing, loop unrolling, scheduling (folding), multi-channelization, and architectural selection to produce an optimal design.
Folding takes the operations associated with a datapath and maps them onto fewer resources operating at a higher rate. For example, consider a FIR filter with 100 taps (stages) running at 1 MHz. Each tap has an associated multiplier and adder function. One approach would be to use 100 multipliers and 100 adders running at 1 MHz. Alternatively the architecture could comprise one multiplier and one adder running at 100 MHz, with the intermediate results being stored in memory. Synphony HLS will create the option that minimizes area while meeting the timing constraints.
Consider a video signal in which the same DSP operations are required on the red, green, and blue channels. In this case, the user needs only identify one channel and tell Synphony HLS to use it for multiple signals if it can. If the sample rate is sufficiently low compared to the system clock, the synthesis engine will automatically identify the additional channels and apply the multi-channelization technique to them.
The Synphony HLS engine automatically optimizes the entire design at multiple levels by applying pipelining, scheduling and binding optimizations across language and model boundaries.
Optimizing for the Target Technology
Synphony HLS uses built-in characterization technologies for fast timing analysis. Fast timing analysis is good for quickly comparing the performance of a range of different architectures. But to truly optimize a design, the high-level synthesis engine needs to know the performance of different operators in the target technology. To do this Synphony HLS uses Synplify Premier (FPGA) and Design Compiler (ASIC) for the accurate timing estimation needed to make device-specific optimizations for FPGA and ASIC targets. This methodology enables designers to rapidly explore various architectural tradeoffs from a single model. More importantly, it increases the reliability of verification through design project phases, whether the target is for FPGA prototyping, fast architecture exploration, or ASIC implementation.
Synphony HLS allows users to control and specify the timing of the interfaces to the DSP engine so that it is easier to integrate the design within a SoC design. It takes the (untimed) model and M-language input and compiles it into an intermediate format, which is ‘approximately timed’. This representation specifies latency and has some cycle-accurate timing, but doesn’t yet have full timing information like the RTL description. The hardware design team can use the approximately timed model to check that buffers are sized appropriately at their inputs and outputs.
Virtual Prototyping and Verification
As well as producing RTL, Synphony HLS generates flexible, high-performance fixed-point ANSI C-models that the verification team can use in virtual platforms for early software development and system simulation, evaluation and analysis. To help verification engineers, Synphony HLS can also auto-generate testbenches.
DSP design is currently one of the fastest-growing application areas in digital electronics. Both DSP architects and hardware design engineers have robust and proven design tools: MATLAB and Simulink for the architects and logic simulation and synthesis for the design engineers. Until now, however, there has been no efficient, automated methodology to bridge the two domains.
Synphony HLS a more automated design and verification flow from high level MATLAB descriptions. It enables algorithm and system engineers to prototype, validate, and explore their algorithm concepts much earlier in the design cycle and it allows them to continue to work at a high level of abstraction and have a smooth handover to the hardware and verification teams.
Hardware designers have a robust starting point with the mixed M-language and model-based specification. Synphony HLS allows them to quickly explore different architectures and select the best for their performance and power goals, and then implement the architecture in their chosen target technology, whether ASIC, FPGA or prototype without having to re-code the design.
Product Marketing Director for High Level Synthesis and System Level Products
Chris Eddington drives the product and technical marketing for high level synthesis and system level products in the Synplicity Business Group.
Prior to joining Synopsys, Mr. Eddington was Director of Product Marketing at Synplicity, Inc., which was acquired by Synopsys in May 2008. Before Synplicity he was at Mellanox Technologies where he led the strategic and technical marketing for networking ICs in the high performance computing market. While at 8x8 Inc. he developed several DSP microprocessors for video and voice processing applications.
Previous to that he worked as a systems analyst at NASA’s Jet Propulsion Laboratory and held several IC design positions in the wireless communications and networking industry.
Mr. Eddington holds a master’s degree in Signal and Image Processing from the University of Southern California and an undergraduate degree in Physics and Math from Principia College.
©2010 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.