HOME   IP   TECHNICAL BULLETIN   ARTICLE ARCHIVES   ARTICLE

DesignWare Technical Bulletin Article

Search Tools

Spotlight

Performance of Different Multipliers in the DesignWare Building Block IP

-Arshid H Syed Sr. CAE

Multipliers are some of the most important components in datapath design. The DesignWare Library contains a variety of technology-independent, high-quality and high-performance IP blocks.

This article explores and compares the performance and functionality of the combinational, pipelined and sequential multipliers available in the DesignWare Building Block IP (DWBB).

Combinational Multiplier (DW02_mult): In this multiplier, widths of operands are parameterizable, and it supports both signed and unsigned operation. For more information on its usage please see the datasheet:
http://www.synopsys.com/dw/doc.php/doc/dwf/datasheets/dw02_mult.pdf

There are multiple implementations of the combinational multiplier: csa (carry-save array), wall (Booth-recoded Wallace-tree) and nbw (non-Booth recoded Wallace-tree for bit-width <41). These traditional implementations are fixed or static, and choosing between them allows the user to make area versus delay trade-offs. More information on these implementations is available in many computer arithmetic textbooks.

Starting with the 2004.12 release, a new implementation named "pparch" (parallel-prefix architecture) is available. This implementation is flexible and is dynamically generated based on context, e.g., area and timing constraints, and technology library. It exploits the characteristics of different implementations and generates the optimal architecture.

Following is a speed and area comparison for different static implementations with the flexible "pparch" implementation.

Synthesis results depend on the constraints and technology libraries used. The results in table 1 are obtained under the following conditions and constraints:

  • Designs – simple, unsigned multiplication operation a*b (widths= 8, 16, 32 and 64)
  • Design Compiler version– Y-2006.06-SP2
  • Library – TSMC 90 nm
Flow & Constraints:

read_verilog simple_design.v
set_max_delay -from [all_inputs] -to [all_outputs] 0 
set_max_delay 0 [all_outputs]
set_max_area 0
compile 
#reports generation


Table 1: Synthesis results for DW02_mult

  csa* Wall pparch csa wall pparch pparch
Bit-width Delay (ns) Delay(ns) Delay(ns) Area
(gates)**
Area
(gates)**
Area in
(gates)**
Throughput***
8 1.72 1.20 0.98 1179 1157 1203 1020
16 3.25 1.60 1.48 4954 4153 3270 675
32 6.27 2.14 1.92 19046 13213 11500 520
64 12.58 2.86 2.54 58266 42156 35466 393

* All implementations except "csa" require a DesignWare license
** 1 nand2x1 gate = 2.4192 Lib Area
*** Throughput in million operations per second (MOPS) = 1000 / delay in ns


Table 1 confirms that flexible implementation "pparch" generates optimal architecture.

Depending on system requirements, a designer may choose an alternative multiplier from the DesignWare Building Block IP: pipelined multipliers (DW_mult_pipe or DW02_mult_n_stage, where n is 2, 3, 4, 5 or 6) or sequential multipliers (DW_mult_seq).

Please refer to the application note (AN 96-002) for information on throughput of combinational and pipelined multipliers:

http://www.synopsys.com/dw/doc.php/doc/dwf/manuals/dw_fdn_appnotes.pdf


Pipelined Multipliers

DW02_mult_n_stage: These multipliers are hard coded for n= 2, 3, 4, 5 and 6. The widths of the operands are parameterizable, and it supports both signed and unsigned data operation. Automatic pipeline retiming ensures optimal placement of pipeline registers within the multiplier to achieve maximum throughput. For more information, please refer to the data sheets available at:

http://www.synopsys.com/dw/doc.php/doc/dwf/datasheets/math_arith_overview.pdf

DW_mult_pipe: The widths of the operands and number of pipeline stages are parameterizable in this multiplier, and it supports both signed and unsigned operation. Automatic pipeline retiming ensures optimal placement of pipeline registers within the multiplier to achieve maximum throughput. Also, it has parameterizable stall and reset modes. For more information please see the data sheet:

http://www.synopsys.com/dw/doc.php/doc/dwf/datasheets/dw_mult_pipe.pdf

The recommended synthesis methodology for the pipelined designs is described in the guideline number 12 of the following white paper on "RTL Coding Guidelines for Datapath Synthesis".

http://www.synopsys.com/coding_guidelines.pdf

Here is the sample synthesis script:

read_verilog multiplier_instantiation.v
set clk_per 2
create_clock [find port inst_clk] -period $clk_per
compile
set_dont_touch *_reg_reg* true
set clk_per 1
create_clock [find port inst_clk] -period $clk_per
set_max_area 0
optimize_registers
compile -incr
#reports generation

There is a stall and reset mode for DW_mult_pipe that is not present in DW02_mult_n_stage.

The delay and area difference between DW02_mult_n_stage and DW_mult_pipe is marginal (see tables 2 and 3), since the underlying implementation is the same for both.


Table 2: Synthesis results for 2-stage pipelined multipliers


  DW02_mult_2_stage   DW_mult_pipe
(2 stages)
 
Bit-width Delay* Area** Throughput*** Delay Area Throughput***
8 1.07 1093 934 1.05 1176 952
16 1.16 3653 862 1.16 3627 862
32 1.36 12196 735 1.40 12212 714
64 1.67 44352 598 1.63 44415 613


Table 3: Synthesis results for 3-stage pipelined multipliers


  DW02_mult_3_stage   DW_mult_pipe
(3stages)
 
Bit-width Delay* Area** Throughput*** Delay Area Throughput***
8 1.0 1253 1000 1.00 1197 1000
16 1.0 3877 1000 1.01 3911 990
32 1.11 13740 900 1.02 13838 980
64 1.29 45535 775 1.22 47305 819

* Delay in ns
** 1n and 2x1 gate = 2.4192 Lib Area
*** Throughput in million operations per second (MOPS) = 1000 / delay in ns


Sequential Multiplier (DW_mult_seq): DW_mult_seq is a sequential multiplier designed for low area, area-time trade-off, or high frequency (small cycle time) applications.

The widths of the operands and number of clock cycles are parameterizable and it supports both signed and unsigned data operation. Also, it has parameterizable registered input /output mode and reset mode. For more information please see the data sheet:
http://www.synopsys.com/dw/doc.php/doc/dwf/datasheets/dw_mult_seq.pdf

Flow and Constraints:

read_verilog multiplier_instantiation.v
//parameters width, tc=0, input/output_mode = 1, 
// num_cycles = 3, 4 etc
set clk_per 1
create_clock [find port inst_clk] -period $clk_per
set_max_area 0
compile
compile –incr
#reports generation

Table 4: Synthesis results for sequential pipelined multipliers


  DW_mult_seq 3 cycles  
Bit-width Delay* Area** Throughput***
8 1.20 783 277
16 1.69 2410 197
32 2.09 7425 159
64 2.64 25078 126


  DW_mult_seq 4 cycles  
Bit-width Delay* Area** Throughput***
8 1.00 625 250
16 1.48 1828 168
32 1.85 6127 135
64 2.38 19703 105

* Delay in ns
** 1 n and 2x1 gate = 2.4192 Lib Area
*** Throughput in million operations per second (MOPS) = 1000 / (delay in ns * cycles)


Conclusion:

DesignWare Library has a wide variety of multipliers — combinational, pipelined and sequential. Users can select any of these multipliers based on system requirements.

Please check the following link for the complete list of DesignWare Building Blocks, including Floating Point Components:

http://www.synopsys.com/dw/doc.php/doc/dwf/intro.pdf