ASIP Models

ASIP Designer comes with an extensive library of example processor models provided as nML source code. They can be used as a starting point for architectural exploration and customer-specific production designs, or just be partially leveraged as reference implementation for selected architectural features. All these models come with a fully working toolset, SDK and synthesizable RTL, but are not to be considered as verified IP.

Microcontrollers


Compact 16-bit RISC microcontroller

Compact 16-bit RISC microcontroller with reduced hardware

Trv (Family)

Variants of microcontrollers with RISC-V ISA

DLX (Family)

Variants of Hennessy & Patterson 32-bit RISC microcontroller DLX

Generic DSPs


16/32-bit DSP with single MAC unit, dual load-store units with post-modify addressing, and 3-way instruction-level parallelism in 16/32-bit variable-length instructions

Educational Models


Tvec (Family)

Variants of wide SIMD processor, with per-lane predication controlled by predicate registers, and gather/scatter-based vector addressing. Additional family member supports compilation of OpenCL C kernels

Tvliw (Family)

Variants of a 4-slot VLIW processor, with predication of VLIW slots and instruction compaction

Tutorial model used in basic processor modeling hands-on laboratory

Workshop model: Matrix multiplication on a RISC-V scalar core (Trv32p5x) with SIMD vector and ILP extensions

Historic educational model used in manuals

Domain-Specific Accelerators


Video accelerator for motion estimation

Accelerator for gaussian image filtering

SIMD vector processor for communication kernels, supporting complex-type operations

Scalar accelerator for block matrix inversion

Scalar FFT accelerator

Accelerator for 5G New Radio MMSE equalization using Cholesky decomposition

Accelerator for SHA256 hashing by extension of a RISC-V scalar core

Accelerator for the Kyber key encapsulation mechanism (post-quantum cryptography) by extension of a RISC-V scalar core

AI accelerator for MobileNet Convolutional Neural Network

Medium-throughput AI accelerator supporting TFLM

Primecore *

ASIP for FFT and DFT computation in 4G/5G mobile devices, supporting:

  • FFT for all power-of-2 sizes ranging from 8 to 2048
  • DFT for all prime-factorizable sizes ranging from 6 to 1536

Tcrypt *

Accelerator for AES encryption and decryption

Tvox *

Accelerator for simultaneous localization and mapping (SLAM)

JEMA/JEMB *

Dual-ASIP design for JPEG encoding

* Available on demand. For more information, please contact Synopsys by sending your request to asipinfo@synopsys.com

Microcontrollers

Tmicro

16-bit microcontroller

  • 16-bit integer data path
  • 3-stage exposed pipeline
  • 8x16-bit general-purpose register file
  • 16-bit instruction width
  • 32-bit multi-cycle multi-word long immediate instructions
  • Single data memory
  • Separate AGU with indirect addressing and post-modify addressing modes
  • Additional features:
    • 16x16->32-bit multiplier
    • 16-bit serial divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Back to example models overview

Tnano

16-bit microcontroller with reduced hardware (based on Tmicro)

Differences to Tmicro:
  • No HW multiplier
  • No serial divider
  • No separate AGU: address computations are performed on the ALU
  • No zero-overhead loop support
  • No 32-bit instructions

Back to example models overview

Trv (Family)

The Trv family is a collection of RISC-V processor models combining different data path widths, pipeline depths, and optional extensions. The base models, supporting integer and multiplication instructions, are labeled Trv<ww>p<n>[f][x][c], with <ww> denoting the data path width (32 or 64) and <n> denoting the pipeline depth (3 or 5). Optional extensions are indicated by additional suffixes:

  • Suffix “f” denotes single-precision floating point extensions (32-bit only).
  • Suffix “x” denotes selected DSP extensions (can be combined with “f”).
  • Suffix “c” denotes support for compressed 16-bit instruction format (Trv32p3 only).

A separate model, Trv32p3sdx, with “sdx” denoting “simple data path extensions” contains a low-barrier modeling skeleton for custom data path extensions and comes with a set of example implementations for different application domains, such as FFT, SHA256 encryption, and a neural network for keyword spotting.

The following table lists the features of the available Trv family models in detail.

Trv32p3 (base model):

32-bit RISC-V microcontroller with with 3-stage pipeline

 

  • Supported ISA:
    • RV32IM: base integer instructions + multiplication + division
    • Zicsr: control and status register instructions
    • Zba: advanced address generation
    • Zbb: basic bit manipulation
    • Zbs: single-bit instructions
  • 32-bit integer data path
  • 3-stage protected pipeline
    • Bypasses & HW stalls
  • 32x32-bit general-purpose register file
  • 32-bit instruction width
  • Single data memory
  • Separate AGU with indirect addressing
  • Additional features:
    • 32x32->64-bit multiplier
    • 32-bit serial divider
    • Interrupt support
    • OCD support

Trv32p3x (variant):

Trv32p3 with DSP extensions

Features on top of Trv32p3:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p3f (variant):

Trv32p3 with floating-point hardware support

Features on top of Trv32p3:

  • Supported ISA:
    • RV32IMFZfinx
  • FPU based on HardFloat [Hauser]
  • Single-precision serial division & square-root unit

Trv32p3fx (variant):

Trv32p3f with DSP extensions

Features on top of Trv32p3f:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p3c (variant):

Trv32p3 with compressed instruction support

Features on top of/different from Trv32p3:

  • Supported ISA:
    • RVC: Support for 16-bit compressed instruction format
  • No interrupt support

Trv32p5 (variant):

32-bit RISC-V microcontroller with 5-stage pipeline

Features different from Trv32p3:

  • 5-stage protected pipeline (instead of 3)

Trv32p5x (variant):

Trv32p5 with DSP extensions

Features on top of Trv32p5:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p5f (variant):

Trv32p5 with floating-point hardware support

Features on top of Trv32p5:

  • Supported ISA:
    • RV32IMFZfinx
  • FPU based on HardFloat [Hauser]
  • Single-precision serial division & square-root unit

Trv32p5fx (variant):

Trv32p5f with DSP extensions

Features on top of Trv32p5f:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv64p3 (base model):

64-bit RISC-V microcontroller with 3-stage pipeline

 

  • Supported ISA:
    • RV64IM: base integer instructions + multiplication + division
  • 64-bit integer data path
  • 3-stage protected pipeline
    • Bypasses & HW stalls
  • 32x64-bit general-purpose register file
  • 32-bit instruction width
  • Single data memory
  • Separate AGU with indirect addressing
  • Additional features:
    • 64x64->128-bit multiplier
    • 64-bit serial divider
    • OCD support

Trv64p3x (variant):

Trv64p3 with DSP extensions

Features on top of Trv64p3:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv64p5 (variant):

64-bit RISC-V microcontroller with 5-stage pipeline

Features different from Trv64p3:

  • 5-stage protected pipeline (instead of 3)

Trv64p5x (variant):

Trv64p5 with DSP extensions

Features on top of Trv64p5:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p3sdx (variant):

Trv32p3c with skeleton for custom data path extensions

Features on top of Trv32p3c:

  • Model stubs for low-barrier modeling of extension instructions
  • Shared 32x32-bit / 16x64-bit register file to enable both 32-bit and 64-bit extensions
  • Zero-overhead loop support:
    • 2-level do-loop
  • AGU with post-modify addressing modes

Back to example models overview

DLX (Family)

DLX (base model):

32-bit microcontroller (Hennessy & Patterson DLX)

 

  • 32-bit integer data path
  • 5-stage protected pipeline
    • Bypasses & HW stalls
  • 32x32-bit general-purpose register file
  • 32-bit instruction width
  • Single data memory
  • Separate AGU with indirect addressing and post-modify addressing modes
  • Additional features:
    • 32x32->32-bit multiplier
    • 32-bit serial divider
    • Zero-overhead loop support:
      • 2-level do-loop
      • 1-level zloop
    • Interrupt support
    • OCD support

FLX (variant):

DLX with HW floating point unit

Features on top of DLX base model:

  • 32-bit floating-point unit
  • Floating-point multicycle divider and square-root
  • Variant with custom 24-bit non-IEEE floating-point type

TLX (variant):

DLX with reduced register file and exposed shallower pipeline

Features different from DLX base model:

  • Reduced register file (16 x 32-bit)
  • 3-stage exposed pipeline

ILX (variant):

DLX with multi-threading support, exposed pipeline

Features different from DLX base model:

  • 4-way static multi-threading support
  • 4-fold instantiation of original DLX register set
  • 5-stage exposed pipeline

PLX (variant):

DLX with multi-threading support, protected pipeline

Features different from DLX base model:

  • 8-way static multi-threading support
  • 8-fold instantiation of original DLX register set

VLX (variant):

DLX with SIMD vector extensions

Features on top of DLX base model:

  • 4-lane SIMD vector ALU (4 x 32-bit)
  • 16 x 128-bit vector register file
  • Vector load/store (128-bit memory access)
  • 5-stage protected vector pipeline:
    • Bypassed vector registers

BLX (variant):

DLX with simple branch predictor

Features on top of DLX base model:

  • Branch prediction logic
  • Branch target buffer (BTB) with 64 entries, fully associative content-addressable memory

Back to example models overview

Generic DSPs

Tdsp

16/32-bit DSP with single MAC unit, dual load-store units with post-modify addressing, and 3-way instruction-level parallelism in 16/32-bit variable-length instructions

  • 16/32-bit fractional data path
  • 3-stage exposed pipeline
    • Bypassed modifier registers only
  • Register files:
    • 8x16-bit data register file
    • 4x32-bit long-word register file
    • 8x20-bit pointer register file
    • 4x16-bit modifier register file
  • 16/32-bit instructions
  • Dual-port data memory
  • 2 AGUs with post-modify and cyclic addressing modes
  • Additional features:
    • 16x16->32-bit MAC unit
    • 32-bit serial divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Back to example models overview

Educational Models

Tvec (Family)

Tvec1 (base model):

Scalar microcontroller with additional SIMD vector data path

  • Based on Tmicro
  • 16-bit integer scalar data path
  • 128-bit SIMD vector data path
  • 3-stage exposed pipeline
  • Register files:
    • 8x16-bit scalar register file
    • 4x128-bit vector register file
  • 16-bit instruction width
  • Single data memory with support for both scalar and wide vector access
  • Single AGU with indirect and post-modify addressing modes
  • 8-lane SIMD vector ALU
    • (additive arithmetic, logic, min/max, vector sum)
  • Additional features:
    • 16x16->32-bit scalar Multiplier/mac unit
    • No hardware divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Tvec2 (variant):

Tvec1 with vector predication

Features on top of Tvec1:

  • 4x8-bit vector condition register file
  • Guarded SIMD instructions via vector predication (lane-enables)

Tvec3 (variant):

Tvec2 with vector-based vector addressing

Features on top of Tvec2:

  • Vector load/store instructions with vector-based vector addressing

Tvec4 (variant):

Tvec2 with scalar-based vector addressing

Features on top of Tvec2:

  • Vector load/store instructions with scalar-based vector addressing
  • Gather-scatter I/O interface to resolve memory bank access conflicts

Tvec5 (variant):

Tvec4 support for multiple vector types on shared vector ALU

Features on top of Tvec4:

  • Vector ALU supporting two vector types on shared hardware:
    • 8x16-bit SIMD data path
    • 4x32-bit SIMD data path

 

Back to example models overview

Tvliw (Family)

Tvliw1 (base model):

32-bit microprocessor with 4-slot VLIW instruction level parallelism

  • 32-bit integer data path
  • 3-stage exposed pipeline
  • Register files:
    • 16x32-bit data register file
    • 8x32-bit pointer register file
    • 8x32-bit modifier register file
  • 96-bit instruction width
  • 4-way VLIW instruction level parallelism
    • 2 arithmetic slots
    • 2 load/store/move slots
  • Dual-port data memory
  • 2 AGUs with post-modify addressing modes
  • Additional features:
    • 32x32->32-bit multiplier
    • Zero-overhead loop support:
      • 1-level do-loop

Tvliw2 (variant):

Tvliw1 with variable-length instruction level parallelism

Features on top of Tvliw1:

  • Variable-length instruction formats with predecoding and expansion in the PCU:
    • 1 to 4 parallel instructions
    • 24/48/72/96-bit instruction width
  • Instruction predication based on up to 8 dynamic conditions
  • 8x1-bit condition register file for instruction predication

Tvliw3 (variant):

Tvliw2 with additional 2-cycle program fetch pipeline

Features on top of Tvliw2:

  • Program fetch pipeline supporting program memory with 2-cycle load latency
  • Loop instruction buffer

Back to example models overview

Tinycore2

Tutorial model used in basic processor modeling hands-on laboratory

  • 16-bit integer data path
  • 4-stage exposed pipeline
  • 8x16-bit register file
  • 16-bit ALU
  • 14-bit instruction width
  • Single-port memory with indirect and post-increment addressing modes
  • No separate AGU, address computation performed on ALU
  • Zero-overhead loop support:
    • 1-level do-loop

Back to example models overview

Matmul

Workshop model: Matrix multiplication on a RISC-V scalar core (Trv32p5x) with SIMD vector and ILP extensions.

Features on top of/different from Trv32p5x:

  • 4-lane SIMD vector data path (4x32-bit)
    • Including 4-lane vector mac unit (32x32->32-bits)
  • 8x128-bit vector register file
    • with exposed pipeline, partially bypassed
  • Unified vector/scalar memory

Back to example models overview

Tctcore

Historic educational model used in manuals

  • 16-bit integer data path
  • 4-stage exposed pipeline
  • Register files:
    • 8x16-bit distributed data register file
    • 4x10-bit distributed pointer register file
    • 4x10-bit distributed modifier register file
  • 18-bit instruction width
  • Dual-port data memory
  • 2 AGUs with post modify addressing modes and separated pointer/modifier register subsets
  • 16-bit ALU with dedicated operand/result registers
  • Additional features:
    • 16x16->32-bit scalar multiplier/mac unit with dedicated operand/result registers
    • Zero-overhead loop support:
      • 1-level do-loop

Back to example models overview

Domain-Specific Accelerators

Tmotion

Video accelerator for motion estimation

  • Based on Tmicro
  • 8/16-bit scalar data path
  • 128-bit SIMD vector data path
  • 3-stage exposed pipeline
  • Register files:
    • 8x16-bit scalar register file
    • 4x128-bit vector register file
  • 16/32/48-bit instructions
  • Shared scalar/vector data memory with unaligned 128-bit vector access
  • Dedicated coefficient memory with 128-bit vector access
  • 2 AGUs with post-modify addressing modes
  • 16-lane SIMD vector ALU with specialized vector absolute-difference instructions
  • Zero-overhead loop support:
    • 3-level do-loop

Back to example models overview

Tgauss

Accelerator for Gaussian image filtering

  • Based on TLX
  • 16/32-bit scalar data path
  • 48-bit SIMD vector data path
  • 4-stage exposed pipeline
  • Register files:
    • 16x32-bit scalar register file, split into separately accessible 16-bit low/high parts
    • Two distributed 10x48-bit vector register files with cyclic buffer access
    • 16x5-bit pointer register file for cyclic buffering
  • 32-bit instructions
  • 32-bit scalar data memory
  • Separate vector memories for the input/output image
  • Separate vector memory for line buffers
  • 2x24-bit vector data path (2 RGB pixels) with 6-lane bytewise multiply/accumulate unit
  • Additional features:
    • 32x32->32-bit pipelined multiplier (2 cycles)
    • 32-bit sequential divider performing 3 iterations in parallel
    • Zero-overhead loop support:
      • 3-level do-loop
    • OCD support

Back to example models overview

Tcom8

SIMD vector processor for communication kernels, supporting complex-type operations

  • Based on Tmicro
  • 16/32-bit scalar data path
  • 128-bit SIMD vector data path
  • 4-stage exposed pipeline
  • Register files:
    • 8x16-bit scalar register file,
    • 4x128-bit vector register file
    • 4x320-bit partitioned vector accumulator register file
    • 4x16-bit pointer register file
    • 4x16-bit modifier register file
  • 32-bit instruction width
  • 2-way static ILP for scalar/vector instructions
  • 4-way static ILP for custom FFT instructions
  • Dual-port vector memory and separate single-port vector coefficient memory
  • 3 AGUs supporting cyclic, bit-reverse and specialized next-butterfly addressing modes
  • 8x40-bit/4x80-bit shared vector ALU supporting 8-lane SIMD fixed-point or 4-lane SIMD complex-fixed-point operations:
    • Vector shift unit
    • Vector multiply/MAC
    • Vector butterfly (complex only)
  • Additional features:
    • 16x16->32-bit multiplier
    • 16-bit serial divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Back to example models overview

MXcore

Scalar accelerator for block matrix inversion

  • 32-bit integer/floating point data path with 2x32-bit complex number support
  • 3-stage exposed pipeline
  • Register files:
    • 8x32-bit data register file,
    • 4x16-bit pointer register file
    • 4x16-bit modifier register file
  • 16/32-bit instruction width
  • 2-way static ILP
    (arithmetic || memory/move/control)
  • Single AGU with post-modify addressing modes
  • 32-bit integer ALU
  • 32-bit floating-point ALU
  • Additional features:
    • 32x32->64-bit integer multiplier
    • 32-bit floating point multiplier
    • 32-bit sequential divider (int and float) performing 3 iterations in parallel
    • Zero-overhead loop support:
      • 2-level do-loop
    • OCD support

Back to example models overview

FFTcore

Scalar FFT accelerator

(minimal core optimized for FFT application kernel, without support for C built-in types or arbitrary C code)

  • 48-bit complex fixed-point data path
  • 3-stage exposed pipeline
  • Two distributed register files of 4x48-bit each
  • 20-bit instruction width
  • Up to 5-way ILP for FFT inner loop
    • (load || store || coef_load || mul || butterfly)
  • 2 data memories, 1 separate coefficient memory
  • 3 specialized AGUs with post-modify, circular, and custom butterfly addressing modes
  • 48-bit ALU with complex multiply and butterfly
  • Zero-overhead loop support:
    • 2-level do-loop

Back to example models overview

MMSE
Accelerator for 5G New Radio MMSE equalization using Cholesky decomposition, with FLX processor as scalar base

Features on top of FLX:

  • 64-bit complex floating-point data path
  • N-lane SIMD complex vector processing unit with design-time configurable vector size / number of lanes using N as parameter
  • 8 x (Nx64-bit) vector register file
  • 64-bit instruction width
  • 4-way ILP:
    • (scalar/memory || move || vector complex mul || vector complex add/sum)
  • Vector load/store with various application-specific post modify addressing modes tuned for efficient access of matrix elements during Cholesky decomposition
  • Balanced data path to maximize memory bandwidth utilization
SHA256

Accelerator for SHA256 hashing by extension of a RISC-V scalar core

  • Based on Trv32p3 (RV32IM ISA), reduced by unused features:
    • No hardware multiplier
    • No hardware divider
    • No OCD support
    • Reduced register file (see below)
  • 3-stage protected pipeline:
    • Bypasses & HW stalls
  • 16x32-bit register file
  • Dedicated SHA256-step instructions
  • Separate data memory for K-table
  • 2 AGUs with post-modify addressing modes
  • 32-bit instruction width
  •  3-way ILP for critical loop instructions:
    • (SHA-step || load data || load K-table)
  • Zero-overhead loop support:
    • 1-level zloop

Back to example models overview

Tsec

Accelerator for the Kyber key encapsulation mechanism (post-quantum cryptography) by extension of a RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

  • Support for SHA3 hashing
    • 25x64-bit dedicated state register file
    • 64-bit load/store instructions
    • Keccak hash unit with dedicated instruction
  • Dedicated instructions for Kyber modulo-based operations
    • 2-way packed SIMD
    • Montgomery and Barret reduction
    • NTT butterfly

Back to example models overview

Tmoby
AI accelerator for MobileNet Convolutional Neural Network, with Trv32p3 RISC-V processor (RV32IM ISA) as scalar base

Features on top of Trv32p3:

  • Additional, 4th pipeline stage used for vector memory load only
  • 64-lane SIMD vector processing unit (64x8-bit data path)
    • Including 64-lane vector mac unit (8x8->32-bits)
  • Many distributed register files, including:
    • 4x64-bit vector register file
    • 3x512-bit matrix register file
    • 3x2048-bit vector accumulator register file
  • Separate vector memories:
    • Vector feature memory with vector addressing
    • Vector weight memory with scalar addressing
  • 90-bit instruction width
  • 4-way ILP:
    • (scalar || vector arithmetic || vector feature memory || vector weight memory)
  • Zero-overhead loop support:
    • o   3-level do-loop
    • o   1-level zloop

Back to example models overview

smarT
Accelerator for medium-throughput CNN applications, based on RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

  • Dual convolutional unit with 16-lane SIMD 8-bit multipliers per unit
  • 4x32-bit vector data path including:
    • Vector shift-round-sat unit
  • Additional registers:
    • Quad-access (4x32-bit) to existing central register file
    • Two accumulator register files (4x32-bit)
    • Vector address register file (4x32-bit)
  • 128-bit memory interface with 4 banks of 32-bit and vector addressing
  • Small local memory
  • Low-overhead DMA
  • Proof-of-concept support of TensorFlow Light for Microcontrollers (TFLM)

Back to example models overview