DesignWare Technical Bulletin Article

Search Tools


2007.03 DesignWare Library Datapath and Building Block IP - DesignWare® Library introduces 19 new Building Block IPs in the 2007.03 release

Arshid Syed, Sr. CAE

DesignWare® Library introduced 19 new Building Block IPs in the 2007.03 release. In the 2006.06 release, there were 21 new Building Block IP introduced. The DesignWare Library Datapath and Building Block IP is tightly integrated into Design Compiler (DC) and is the part of DC installation. This release contains new Floating Point components, Datapath Functions, Synchronizers and FIFOs. All of these components are available in the DesignWare Library at no additional cost.

This article provides a brief description and associated features of the new blocks.

Floating Point Components:

The new DesignWare Floating Point components can be instantiated in both VHDL and Verilog designs. The precision of the floating point numbers can be parameterized and covers all the IEEE formats. These Floating Point components can be synthesized with DC. These components can be used in a variety of applications, like graphics, signal processors and general purpose processors, where floating point arithmetic operations, comparisons and conversions are common.

These new Floating Point components take advantage of the DesignWare datapath generator technology that delivers better QoR than traditional technology. The DesignWare datapath generator is tightly integrated in DC and supported by the Galaxy Design Platform. When generating the datapath circuits, it takes into consideration the full timing context of the surrounding design and the characteristics of technology library to generate hybrid datapath structures best fitted for the design target. The new floating point components not only save customer development effort but also deliver better QoR than many other alternatives, such as 3rd party IP.

Verification of the Floating Point library is not trivial because of the complexity of the components. These components have modules for alignment, normalization and rounding, apart from regular arithmetic operation. The DesignWare Floating Point components are exhaustively verified by making use of an IEEE reference model, simulating the test vectors generated by different CPUs, testing the special cases like overflow, underflow and normal/denormal boundary, and random tests.

DesignWare Floating Point components are designed with the flexibility to meet wide range of application requirement. They can be parameterized to support different sizes, precisions (half, single, double or custom) and cost requirements. For example, customers can trade-off IEEE compatibility by setting the parameter ieee_compliance to "0" for better area.

In addition to support for arithmetic operation with two operands, DesignWare Floating Point components include the support for multiple operands that are popular in certain applications, such as graphics. The multiple operand component DW_fp_dp2, gives the result equivalent z=a*b+c*d, the accuracy of this component is much better than the accuracy of an implementation using two DW_fp_mult components and one DW_fp_add component.

Similarly, DW_fp_dp3 is a floating-point component that computes the dot-product of six floating-point inputs (a, b, c, d, e, and f) to produce a floating point result z=a*b+c*d+e*f. The accuracy of this component is similar to that DW_fp_dp2.

The other multiple operand component is DW_fp_dp4 which the dot-product of eight floating-point inputs (a, b, c, d, e, f, g and h) to produce a floating point result z=a*b+c*d+e*f+g*h, here also of accuracy of this component is similar to that DW_fp_dp2 or dp3.

The following are the features of the Floating Point components:

  • The precision of floating point numbers is parameterizable. The parameters cover all the IEEE formats.
  • The parameter range for exponents is from 3 to 31 bits.
  • The parameter range for the significand or the fractional part of the floating point number is from 2 bits to 256 bits.
  • Complete IEEE 754 compliance supported and can be controlled with the ieee_compliance (=FALSE by default) parameter.
The following is the list of new Floating Point components:
   DW_fp_div_seq  Floating Point Sequential Divider
   DW_fp_dp2  2-Term Floating Point Dot-product
   DW_fp_dp3  3-Term Floating Point Dot-product
   DW_fp_dp4  4-Term Floating Point Dot-product
   DW_fp_invsqrt  Floating Point Reciprocal of Square Root
   DW_fp_sqrt  Floating Point Square Root
   DW_fp_square  Floating Point Square
For detail information on DesignWare Floating Point components and its datasheets please go to the following link:


Datapath Functions

The Datapath Functions are collection of HDL functions that can be called in a design's RTL code. These functions describe dedicated datapath functionality (for example, a blend function which is used in graphics) in synthesizable RTL code. These datapath functions are made available through packages in VHDL and through include files in Verilog.

The following are the features of DesignWare Datapath Functions:

  • Ease of use: Simple functional call
    assign z = DWF_dp_rnd_tc (a * b, DW_dp_rnd_near_even) + c;
  • Functional Correctness: These functions are pre-verified
  • Best Quality of Results (QoR): Since the code is optimized for datapath synthesis with Design Compiler.
  • Design and Flow integration: Datapath Function's HDL code tightly integrates into surrounding datapath functionality, allowing high-level optimizations and datapath synthesis. Also, they are tightly integrated in to Design Compiler datapath synthesis flow.
The following is the list of DesignWare Datapath Functions:
   DWF_dp_rndsat   Performs arithmetic rounding (released 2006.06-SP1)
   DWF_dp_rnd   Performs arithmetic rounding and saturation (released 2006.06-SP1):
   DWF_dp_simd_add   An SIMD adder can carry out one large or several small additions in parallel
   DWF_dp_simd_mult  An SIMD multiplier can carry out one large or several small multiplications in parallel.
   DWF_dp_countones  The count-ones function simply counts the number of ones in an input vector by adding up all individual bits


Synchronizer Family

The Clock Domain Crossing (CDC) IP is used to safely connect signals between two different clock domains. Each synchronizer block uses a wide variety of clock domain crossing schemes. Detecting clock domain crossing issues is difficult in RTL simulations, since these issues are real-world phenomenon and difficult to predict. DesignWare Synchronizers are carefully verified and validated in real designs. These synchronizers can be used in many applications, like data bus controllers, or any interface sending parallel data between two clock domains.

Some of the common features of Synchronizers are (feature depends on the type of synchronizer):

  • Parameterized data bus
  • Poeticized synchronizing stages
  • Parameterized test feature
  • Parameterized output registration/all outputs registered
  • Ability to model missampling of data on source clock domain
The following is the list of synchronizers:
   DW_data_qsync_hl  Quasi-Synchronous Data Interface for H-to-L Frequency Clocks
   DW_data_qsync_lh  Quasi-Synchronous Data Interface for L-to-H Frequency Clocks
   DW_reset_sync  Reset Sequence Synchronizer

For detailed information on DesignWare Synchronizer datasheets, go to the following link:


Memory FIFOs

Three new blocks are added to the memory/FIFOs:

   DW_fifo_2c_df   DW_fifo_2c_df is a dual independent clock FIFO consisting of DesignWare components DW_fifoctl_2c_df (FIFO controller) and DW_ram_r_w_2c_dff (dual-port synchronous RAM). Word caching (or pre-fetching) is performed in the pop interface to minimize latencies and allow for bursting of contiguous words.
   DW_asymdata_inbuf   Asymmetric Data Input Buffer
   DW_asymdata_outbuf  Asymmetric Data Output Buffer

For individual datasheets, go to the following page:


Other Blocks

DW_pricod     Priority Coder

The output of DW_pricod is a coded one-hot value of the "a" input vector with a 1 at the most significant (left-most) non-zero bit position of "a". All lower order bits (to the right) from the first occurrence of a 1 on the "a" input port are "don't care." The zero output indicates whether all bits of input "a" are 0. If no 1 is found and only 0's are present, the resulting value of output "cod" is all 0's and the value of zero is 1.

See also, the DW_pricod datasheet:


DesignWare Datapath Features and Enhancements

Datapath Generator based implementations (0703-SP1)

Datapath generator implementations are delay-optimized parallel-prefix architectures, called "pparch". The datapath generator performs a constraint and technology driven synthesis, and implements a flexible hybrid circuit structure to give optimal synthesis results for a given context.

Datapath generator based implementations are added for the following components:

DW_squarep, DW02_multp and DW02_tree

Datapath Generator support for Formality's new Datapath Solver

Formality version Z-2006.12 provides significant improvements with new features and enhancements to address formal verification issues within datapath-intensive designs. The new commands are guide_boundary, guide_boundary_netlist, guide_constraints, and guide_replace. These commands pass intermediate information to Formality.

These new features are supported starting with 2006.06-SP4 and newer versions of DC. To enable these features the following command can be used:

  • set synlib_dwgen_fmlink_active TRUE

    Please refer to Formality User Guide for additional information on datapath verification.

    Internal rounding for singleton operators

    Internal rounding allows users making a tradeoff between precision and area in arithmetic circuits. All bits lower than the internal rounding position are discarded from the internal addends. Area is saved because no logic is required to generate and add these bits. However, a rounding error is introduced, which degrades the precision of the result. An offset is automatically added to keep the error range as symmetric and the bias as small as possible.

    The result of a datapath with internal rounding is usually truncated somewhere above the internal rounding position. This truncation can also be rounded by setting the external rounding position to this bit position. Rounding is achieved by adding a constant 1 at the next lower bit position.

    Since internal rounding changes the functionality of a datapath block, for simulation and verification purpose, a behavioral model that reflects the changed functionality is written out as Verilog file in the dwsvf directory.

    Datapath Generator Strategies

    In addition to the basic architecture, the datapath generators implement a variety of micro-architectures that allow optimizing the design for better synthesis results (area and timing). This section describes the smartgeneration strategies (is controlled by set_dp_smartgen_options) that exist to control the micro-architecture of arithmetic datapath components. These strategies apply to all components with the "pparch" implementation as well as complex datapath blocks extracted and synthesized with Design Compiler Ultra.

    See the following application notes for different datapath generator strategies:


    Radix-8 Booth Encoding

    Regular Booth multipliers use a radix-4 encoding. In both the radix-4 and radix-8 Booth multipliers the extra area consumed by the Booth encoding is usually smaller than the area saved by the reduction of partial products (smaller adder tree). This results in an overall area reduction, while the delay stays roughly the same.

    CSM Adder

    The CSM (conditional-sum) adder is a special adder architecture that has a mux-based structure. It is basically a hierarchical carry-select adder. Delay values can be better in mux-based technologies (fast multiplexer cells), but it gives worse area.

    The use of both radix-8 and CSM adders is controlled by set_dp_smartgen_options and it's on by default.

    Enhancement in equality/non equality (EQ/NEQ) comparator

    EQ/NEQ comparators driven by arithmetic operators now can be extracted.


    t1 = a + b +c; 
    t2 = d + e + f; 
    op = (t1 == t2); 
    The 'report_resources' command can be used to check whether the equality/non equality comparator in the design is extracted as a part of datapath.

    Datapath Extraction across Registers:

    Datapath extraction passes the output signals of arithmetic operators through registers in carry-save format, thus eliminating the carry propagation adder and improving synthesis results. This may reduce area and timing by avoiding the need for a final adder in the driving datapath block.

    The following attribute controls the registers which use carry-save signal format:


    The following variable sets the extraction mode:

    set hlo_dp_extract_across_register true

    The registers in carry-save format are reported by report_resources command

    When the variable hlo_dp_extract_across_register is set to "selective"(the default value), datapath is extracted across marked register banks for carry-save graph implementation.

    However, if the attribute dp_extract_across_registers is set to FALSE on at least one of the registers in the bank, then datapath is not extracted across the marked register bank. Two options, binary option and the carry-save option are considered and the best solution is selected automatically. For more details on this variable, please see the man page.

    Unless otherwise noted, all of these new blocks are already available with 2007.03 image of Design Compiler. If you are using any older version of Design Compiler prior to 2006.06, please download the latest version of DesignWare Building Block IP:


    The complete list of DesignWare Building Blocks, including Floating Point components is available at: