Go Back

Explore challenges and solutions in AI chip development

Download eBook

Innovate Faster with Synopsys Multi-Die Solution

Accelerating success from early architecture to manufacturing.

Download eBook

Explore Silicon Design, Verification & Manufacturing

Synopsys is a leading provider of electronic design automation solutions and services.

Simpleware Software

Virtual Prototyping

Synopsys Cloud

Unlimited access to EDA software licenses on-demand

Request a Free Trial

Explore Silicon IP

Synopsys is a leading provider of high-quality, silicon-proven semiconductor IP solutions for SoC designs.

Synopsys IP Portfolio

Download Brochure

Synopsys IP Technical Bulletin

Read Latest Issue

Explore Systems Verification and Validation

Synopsys is a leading provider of hardware-assisted verification and virtualization solutions.

System Test Generation

Company Overview

Synopsys and Ansys are Now United

Learn More

Synopsys Blog

Insights that shape the future.

Visit Our Blog

ASIP Models

ASIP Designer comes with an extensive library of example processor models provided as nML source code. They can be used as a starting point for architectural exploration and customer-specific production designs, or just be partially leveraged as reference implementation for selected architectural features. All these models come with a fully working toolset, SDK and synthesizable RTL, but are not to be considered as verified IP.

Microcontrollers

Tmicro

Compact 16-bit RISC microcontroller

Tnano

Compact 16-bit RISC microcontroller with reduced hardware

Trv (Family)

Variants of microcontrollers with RISC-V ISA

DLX (Family)

Variants of Hennessy & Patterson 32-bit RISC microcontroller DLX

Generic DSPs

Tdsp

16/32-bit DSP with single MAC unit, dual load-store units with post-modify addressing, and 3-way instruction-level parallelism in 16/32-bit variable-length instructions

Educational Models

Tvec (Family)

Variants of wide SIMD processor, with per-lane predication controlled by predicate registers, and gather/scatter-based vector addressing. Additional family member supports compilation of OpenCL C kernels

Tvliw (Family)

Variants of a 4-slot VLIW processor, with predication of VLIW slots and instruction compaction

Tinycore2

Tutorial model used in basic processor modeling hands-on laboratory

Matmul

Workshop model: Matrix multiplication on a RISC-V scalar core (Trv32p5x) with SIMD vector and ILP extensions

Tctcore

Historic educational model used in manuals

Domain-Specific Accelerators

Tmotion

Video accelerator for motion estimation

Tgauss

Accelerator for gaussian image filtering

Tcom8

SIMD vector processor for communication kernels, supporting complex-type operations

MXcore

Scalar accelerator for block matrix inversion

FFTcore

Scalar FFT accelerator

MMSE

Accelerator for 5G New Radio MMSE equalization using Cholesky decomposition

LDPC

Accelerator for 5G Low Density Parity Check decoding

SHA256

Accelerator for SHA256 hashing by extension of a RISC-V scalar core

Tsec

Accelerator for the Kyber key encapsulation mechanism (post-quantum cryptography) by extension of a RISC-V scalar core

Tmoby

AI accelerator for MobileNet Convolutional Neural Network

smarT

Medium-throughput AI accelerator supporting TFLM

Primecore *

ASIP for FFT and DFT computation in 4G/5G mobile devices, supporting:

FFT for all power-of-2 sizes ranging from 8 to 2048
DFT for all prime-factorizable sizes ranging from 6 to 1536

Tcrypt *

Accelerator for AES encryption and decryption

Tvox *

Accelerator for simultaneous localization and mapping (SLAM)

JEMA/JEMB *

Dual-ASIP design for JPEG encoding

* Available on demand. For more information, please contact Synopsys by sending your request to asipinfo@synopsys.com

Microcontrollers

Tmicro

16-bit microcontroller

16-bit integer data path
3-stage exposed pipeline
8x16-bit general-purpose register file
16-bit instruction width
32-bit multi-cycle multi-word long immediate instructions
Single data memory
Separate AGU with indirect addressing and post-modify addressing modes
Additional features:
- 16x16->32-bit multiplier
- 16-bit serial divider
- Zero-overhead loop support:
  - 3-level do-loop
- Interrupt support
- OCD support

Back to example models overview

Tnano

16-bit microcontroller with reduced hardware (based on Tmicro)

Differences to Tmicro:

No HW multiplier
No serial divider
No separate AGU: address computations are performed on the ALU
No zero-overhead loop support
No 32-bit instructions

Back to example models overview

Trv (Family)

The Trv family is a collection of RISC-V processor models combining different data path widths, pipeline depths, and optional extensions. The base models, supporting integer and multiplication instructions, are labeled Trv<ww>p<n>[f][x][c], with <ww> denoting the data path width (32 or 64) and <n> denoting the pipeline depth (3 or 5). Optional extensions are indicated by additional suffixes:

Suffix “f” denotes single-precision floating point extensions (32-bit only).
Suffix “x” denotes selected DSP extensions (can be combined with “f”).
Suffix “c” denotes support for compressed 16-bit instruction format (Trv32p3 only).

A separate model, Trv32p3sdx, with “sdx” denoting “simple data path extensions” contains a low-barrier modeling skeleton for custom data path extensions and comes with a set of example implementations for different application domains, such as FFT, SHA256 encryption, and a neural network for keyword spotting.

The following table lists the features of the available Trv family models in detail.

Trv32p3 (base model):

32-bit RISC-V microcontroller with with 3-stage pipeline

Supported ISA:
- RV32IM: base integer instructions + multiplication + division
- Zicsr: control and status register instructions
- Zba: advanced address generation
- Zbb: basic bit manipulation
- Zbs: single-bit instructions
32-bit integer data path
3-stage protected pipeline
- Bypasses & HW stalls
32x32-bit general-purpose register file
32-bit instruction width
Single data memory
Separate AGU with indirect addressing
Additional features:
- 32x32->64-bit multiplier
- 32-bit serial divider
- Interrupt support
- OCD support

Trv32p3x (variant):

Trv32p3 with DSP extensions

Features on top of Trv32p3:

2-way static ILP:
- arith/ctrl || move/load/store
Zero-overhead loop support:
- 2-level do-loop
- 1-level zloop
AGU with post-modify addressing modes

Trv32p3f (variant):

Trv32p3 with floating-point hardware support

Features on top of Trv32p3:

Supported ISA:
- RV32IMFZfinx
FPU based on HardFloat [Hauser]
Single-precision serial division & square-root unit

Trv32p3fx (variant):

Trv32p3f with DSP extensions

Features on top of Trv32p3f:

2-way static ILP:
- arith/ctrl || move/load/store
Zero-overhead loop support:
- 2-level do-loop
- 1-level zloop
AGU with post-modify addressing modes

Trv32p3c (variant):

Trv32p3 with compressed instruction support

Features on top of/different from Trv32p3:

Supported ISA:
- RVC: Support for 16-bit compressed instruction format
No interrupt support

Trv32p5 (variant):

32-bit RISC-V microcontroller with 5-stage pipeline

Features different from Trv32p3:

5-stage protected pipeline (instead of 3)

Trv32p5x (variant):

Trv32p5 with DSP extensions

Features on top of Trv32p5:

2-way static ILP:
- arith/ctrl || move/load/store
Zero-overhead loop support:
- 2-level do-loop
- 1-level zloop
AGU with post-modify addressing modes

Trv32p5f (variant):

Trv32p5 with floating-point hardware support

Features on top of Trv32p5:

Supported ISA:
- RV32IMFZfinx
FPU based on HardFloat [Hauser]
Single-precision serial division & square-root unit

Trv32p5fx (variant):

Trv32p5f with DSP extensions

Features on top of Trv32p5f:

2-way static ILP:
- arith/ctrl || move/load/store
Zero-overhead loop support:
- 2-level do-loop
- 1-level zloop
AGU with post-modify addressing modes

Trv64p3 (base model):

64-bit RISC-V microcontroller with 3-stage pipeline

Supported ISA:
- RV64IM: base integer instructions + multiplication + division
64-bit integer data path
3-stage protected pipeline
- Bypasses & HW stalls
32x64-bit general-purpose register file
32-bit instruction width
Single data memory
Separate AGU with indirect addressing
Additional features:
- 64x64->128-bit multiplier
- 64-bit serial divider
- OCD support

Trv64p3x (variant):

Trv64p3 with DSP extensions

Features on top of Trv64p3:

2-way static ILP:
- arith/ctrl || move/load/store
Zero-overhead loop support:
- 2-level do-loop
- 1-level zloop
AGU with post-modify addressing modes

Trv64p5 (variant):

64-bit RISC-V microcontroller with 5-stage pipeline

Features different from Trv64p3:

5-stage protected pipeline (instead of 3)

Trv64p5x (variant):

Trv64p5 with DSP extensions

Features on top of Trv64p5:

2-way static ILP:
- arith/ctrl || move/load/store
Zero-overhead loop support:
- 2-level do-loop
- 1-level zloop
AGU with post-modify addressing modes

Trv32p3sdx (variant):

Trv32p3c with skeleton for custom data path extensions

Features on top of Trv32p3c:

Model stubs for low-barrier modeling of extension instructions
Shared 32x32-bit / 16x64-bit register file to enable both 32-bit and 64-bit extensions
Zero-overhead loop support:
- 2-level do-loop
AGU with post-modify addressing modes

Back to example models overview

DLX (Family)

DLX (base model):

32-bit microcontroller (Hennessy & Patterson DLX)

32-bit integer data path
5-stage protected pipeline
- Bypasses & HW stalls
32x32-bit general-purpose register file
32-bit instruction width
Single data memory
Separate AGU with indirect addressing and post-modify addressing modes
Additional features:
- 32x32->32-bit multiplier
- 32-bit serial divider
- Zero-overhead loop support:
  - 2-level do-loop
  - 1-level zloop
Interrupt support
OCD support

FLX (variant):

DLX with HW floating point unit

Features on top of DLX base model:

32-bit floating-point unit
Floating-point multicycle divider and square-root
Variant with custom 24-bit non-IEEE floating-point type

TLX (variant):

DLX with reduced register file and exposed shallower pipeline

Features different from DLX base model:

Reduced register file (16 x 32-bit)
3-stage exposed pipeline

MLX (variant):

DLX with two-stage fetch pipeline

Features different from DLX base model:

Two-cycle latency for PM loads, resulting in a two-stage fetch pipeline

ILX (variant):

DLX with multi-threading support, exposed pipeline

Features different from DLX base model:

4-way static multi-threading support
4-fold instantiation of original DLX register set
5-stage exposed pipeline

PLX (variant):

DLX with multi-threading support, protected pipeline

Features different from DLX base model:

8-way static multi-threading support
8-fold instantiation of original DLX register set

VLX (variant):

DLX with SIMD vector extensions

Features on top of DLX base model:

4-lane SIMD vector ALU (4 x 32-bit)
16 x 128-bit vector register file
Vector load/store (128-bit memory access)
5-stage protected vector pipeline:
- Bypassed vector registers

BLX (variant):

DLX with simple branch predictor

Features on top of DLX base model:

Branch prediction logic
Branch target buffer (BTB) with 64 entries, fully associative content-addressable memory

Back to example models overview

Generic DSPs

Tdsp

16/32-bit DSP with single MAC unit, dual load-store units with post-modify addressing, and 3-way instruction-level parallelism in 16/32-bit variable-length instructions

16/32-bit fractional data path
3-stage exposed pipeline
- Bypassed modifier registers only
Register files:
- 8x16-bit data register file
- 4x32-bit long-word register file
- 8x20-bit pointer register file
- 4x16-bit modifier register file
16/32-bit instructions
Dual-port data memory
2 AGUs with post-modify and cyclic addressing modes
Additional features:
- 16x16->32-bit MAC unit
- 32-bit serial divider
- Zero-overhead loop support:
  - 3-level do-loop
- Interrupt support
- OCD support

Back to example models overview

Educational Models

Tvec (Family)

Tvec1 (base model):

Scalar microcontroller with additional SIMD vector data path

Based on Tmicro
16-bit integer scalar data path
128-bit SIMD vector data path
3-stage exposed pipeline
Register files:
- 8x16-bit scalar register file
- 4x128-bit vector register file
16-bit instruction width
Single data memory with support for both scalar and wide vector access
Single AGU with indirect and post-modify addressing modes
8-lane SIMD vector ALU
- (additive arithmetic, logic, min/max, vector sum)
Additional features:
- 16x16->32-bit scalar Multiplier/mac unit
- No hardware divider
- Zero-overhead loop support:
  - 3-level do-loop
- Interrupt support
- OCD support

Tvec2 (variant):

Tvec1 with vector predication

Features on top of Tvec1:

4x8-bit vector condition register file
Guarded SIMD instructions via vector predication (lane-enables)

Tvec3 (variant):

Tvec2 with vector-based vector addressing

Features on top of Tvec2:

Vector load/store instructions with vector-based vector addressing

Tvec4 (variant):

Tvec2 with scalar-based vector addressing

Features on top of Tvec2:

Vector load/store instructions with scalar-based vector addressing
Gather-scatter I/O interface to resolve memory bank access conflicts

Tvec5 (variant):

Tvec4 support for multiple vector types on shared vector ALU

Features on top of Tvec4:

Vector ALU supporting two vector types on shared hardware:
- 8x16-bit SIMD data path
- 4x32-bit SIMD data path

Back to example models overview

Tvliw (Family)

Tvliw1 (base model):

32-bit microprocessor with 4-slot VLIW instruction level parallelism

32-bit integer data path
3-stage exposed pipeline
Register files:
- 16x32-bit data register file
- 8x32-bit pointer register file
- 8x32-bit modifier register file
96-bit instruction width
4-way VLIW instruction level parallelism
- 2 arithmetic slots
- 2 load/store/move slots
Dual-port data memory
2 AGUs with post-modify addressing modes
Additional features:
- 32x32->32-bit multiplier
- Zero-overhead loop support:
  - 1-level do-loop

Tvliw2 (variant):

Tvliw1 with variable-length instruction level parallelism

Features on top of Tvliw1:

Variable-length instruction formats with predecoding and expansion in the PCU:
- 1 to 4 parallel instructions
- 24/48/72/96-bit instruction width
Instruction predication based on up to 8 dynamic conditions
8x1-bit condition register file for instruction predication

Tvliw3 (variant):

Tvliw2 with additional 2-cycle program fetch pipeline

Features on top of Tvliw2:

Program fetch pipeline supporting program memory with 2-cycle load latency
Loop instruction buffer

Back to example models overview

Tinycore2

Tutorial model used in basic processor modeling hands-on laboratory

16-bit integer data path
4-stage exposed pipeline
8x16-bit register file
16-bit ALU
14-bit instruction width
Single-port memory with indirect and post-increment addressing modes
No separate AGU, address computation performed on ALU
Zero-overhead loop support:
- 1-level do-loop

Back to example models overview

Matmul

Workshop model: Matrix multiplication on a RISC-V scalar core (Trv32p5x) with SIMD vector and ILP extensions.

Features on top of/different from Trv32p5x:

4-lane SIMD vector data path (4x32-bit)
- Including 4-lane vector mac unit (32x32->32-bits)
8x128-bit vector register file
- with exposed pipeline, partially bypassed
Unified vector/scalar memory

Back to example models overview

Tctcore

Historic educational model used in manuals

16-bit integer data path
4-stage exposed pipeline
Register files:
- 8x16-bit distributed data register file
- 4x10-bit distributed pointer register file
- 4x10-bit distributed modifier register file
18-bit instruction width
Dual-port data memory
2 AGUs with post modify addressing modes and separated pointer/modifier register subsets
16-bit ALU with dedicated operand/result registers
Additional features:
- 16x16->32-bit scalar multiplier/mac unit with dedicated operand/result registers
- Zero-overhead loop support:
  - 1-level do-loop

Back to example models overview

Domain-Specific Accelerators

Tmotion

Video accelerator for motion estimation

Based on Tmicro
8/16-bit scalar data path
128-bit SIMD vector data path
3-stage exposed pipeline
Register files:
- 8x16-bit scalar register file
- 4x128-bit vector register file
16/32/48-bit instructions
Shared scalar/vector data memory with unaligned 128-bit vector access
Dedicated coefficient memory with 128-bit vector access
2 AGUs with post-modify addressing modes
16-lane SIMD vector ALU with specialized vector absolute-difference instructions
Zero-overhead loop support:
- 3-level do-loop

Back to example models overview

Tgauss

Accelerator for Gaussian image filtering

Based on TLX
16/32-bit scalar data path
48-bit SIMD vector data path
4-stage exposed pipeline
Register files:
- 16x32-bit scalar register file, split into separately accessible 16-bit low/high parts
- Two distributed 10x48-bit vector register files with cyclic buffer access
- 16x5-bit pointer register file for cyclic buffering
32-bit instructions
32-bit scalar data memory
Separate vector memories for the input/output image
Separate vector memory for line buffers
2x24-bit vector data path (2 RGB pixels) with 6-lane bytewise multiply/accumulate unit
Additional features:
- 32x32->32-bit pipelined multiplier (2 cycles)
- 32-bit sequential divider performing 3 iterations in parallel
- Zero-overhead loop support:
  - 3-level do-loop
- OCD support

Back to example models overview

Tcom8

SIMD vector processor for communication kernels, supporting complex-type operations

Based on Tmicro
16/32-bit scalar data path
128-bit SIMD vector data path
4-stage exposed pipeline
Register files:
- 8x16-bit scalar register file,
- 4x128-bit vector register file
- 4x320-bit partitioned vector accumulator register file
- 4x16-bit pointer register file
- 4x16-bit modifier register file
32-bit instruction width
2-way static ILP for scalar/vector instructions
4-way static ILP for custom FFT instructions
Dual-port vector memory and separate single-port vector coefficient memory
3 AGUs supporting cyclic, bit-reverse and specialized next-butterfly addressing modes
8x40-bit/4x80-bit shared vector ALU supporting 8-lane SIMD fixed-point or 4-lane SIMD complex-fixed-point operations:
- Vector shift unit
- Vector multiply/MAC
- Vector butterfly (complex only)
Additional features:
- 16x16->32-bit multiplier
- 16-bit serial divider
- Zero-overhead loop support:
  - 3-level do-loop
- Interrupt support
- OCD support

Back to example models overview

MXcore

Scalar accelerator for block matrix inversion

32-bit integer/floating point data path with 2x32-bit complex number support
3-stage exposed pipeline
Register files:
- 8x32-bit data register file,
- 4x16-bit pointer register file
- 4x16-bit modifier register file
16/32-bit instruction width
2-way static ILP
(arithmetic || memory/move/control)
Single AGU with post-modify addressing modes
32-bit integer ALU
32-bit floating-point ALU
Additional features:
- 32x32->64-bit integer multiplier
- 32-bit floating point multiplier
- 32-bit sequential divider (int and float) performing 3 iterations in parallel
- Zero-overhead loop support:
  - 2-level do-loop
- OCD support

Back to example models overview

FFTcore

Scalar FFT accelerator

(minimal core optimized for FFT application kernel, without support for C built-in types or arbitrary C code)

48-bit complex fixed-point data path
3-stage exposed pipeline
Two distributed register files of 4x48-bit each
20-bit instruction width
Up to 5-way ILP for FFT inner loop
- (load || store || coef_load || mul || butterfly)
2 data memories, 1 separate coefficient memory
3 specialized AGUs with post-modify, circular, and custom butterfly addressing modes
48-bit ALU with complex multiply and butterfly
Zero-overhead loop support:
- 2-level do-loop

Back to example models overview

MMSE

Accelerator for 5G New Radio MMSE equalization using Cholesky decomposition, with FLX processor as scalar base

Features on top of FLX:

64-bit complex floating-point data path
N-lane SIMD complex vector processing unit with design-time configurable vector size / number of lanes using N as parameter
8 x (Nx64-bit) vector register file
64-bit instruction width
4-way ILP:
- (scalar/memory || move || vector complex mul || vector complex add/sum)
Vector load/store with various application-specific post modify addressing modes tuned for efficient access of matrix elements during Cholesky decomposition
Balanced data path to maximize memory bandwidth utilization

Back to example models overview

LPDC

Accelerator for 5G Low Density Parity Check decoding, by extension of a RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

128-lane SIMD vector processing unit (128 x 8-bit) with specialized instructions for variable rotation, addsub, minimum detection and element selection
8 x 1024-bit vector register file
64-bit instruction width
Up to 4-way ILP:
- Split arith/ctrl and move/ldst slots of original Trv32p5x to support scalar ldst || vector ldst || dual vector arith
Dedicated 1024-bit wide vector memory

Back to example models overview

SHA256

Accelerator for SHA256 hashing by extension of a RISC-V scalar core

Based on Trv32p3 (RV32IM ISA), reduced by unused features:
- No hardware multiplier
- No hardware divider
- No OCD support
- Reduced register file (see below)
3-stage protected pipeline:
- Bypasses & HW stalls
16x32-bit register file
Dedicated SHA256-step instructions
Separate data memory for K-table
2 AGUs with post-modify addressing modes
32-bit instruction width
3-way ILP for critical loop instructions:
- (SHA-step || load data || load K-table)
Zero-overhead loop support:
- 1-level zloop

Back to example models overview

Tsec

Accelerator for the Kyber key encapsulation mechanism (post-quantum cryptography) by extension of a RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

Support for SHA3 hashing
- 25x64-bit dedicated state register file
- 64-bit load/store instructions
- Keccak hash unit with dedicated instruction
Dedicated instructions for Kyber modulo-based operations
- 2-way packed SIMD
- Montgomery and Barret reduction
- NTT butterfly

Back to example models overview

Tmoby

AI accelerator for MobileNet Convolutional Neural Network, with Trv32p3 RISC-V processor (RV32IM ISA) as scalar base

Features on top of Trv32p3:

Additional, 4^th pipeline stage used for vector memory load only
64-lane SIMD vector processing unit (64x8-bit data path)
- Including 64-lane vector mac unit (8x8->32-bits)
Many distributed register files, including:
- 4x64-bit vector register file
- 3x512-bit matrix register file
- 3x2048-bit vector accumulator register file
Separate vector memories:
- Vector feature memory with vector addressing
- Vector weight memory with scalar addressing
90-bit instruction width
4-way ILP:
- (scalar || vector arithmetic || vector feature memory || vector weight memory)
Zero-overhead loop support:
- o 3-level do-loop
- o 1-level zloop

Back to example models overview

smarT

Accelerator for medium-throughput CNN applications, based on RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

Dual convolutional unit with 16-lane SIMD 8-bit multipliers per unit
4x32-bit vector data path including:
- Vector shift-round-sat unit
Additional registers:
- Quad-access (4x32-bit) to existing central register file
- Two accumulator register files (4x32-bit)
- Vector address register file (4x32-bit)
128-bit memory interface with 4 banks of 32-bit and vector addressing
Small local memory
Low-overhead DMA
Proof-of-concept support of TensorFlow Light for Microcontrollers (TFLM)

Back to example models overview