Cloud native EDA tools & pre-optimized hardware platforms
Synopsys’ solution to efficiently design and implement your own application-specific instruction-set processor (ASIP) when you can’t find suitable processor IP, or when hardware implementations require more flexibility.
This bi-annual newsletter provides you with easy access to ASIP-related resources.
When developing an ASIP architecture, engineering teams typically do not start from a blank sheet of paper. Often, ASIP Designer™ customers start from one of the example processor models that are shipped with the tool installation. This library contains models that are based on either publicly known ISAs such as Hennessy and Patterson’s DLX or the more recent RISC-V ISA, or on other ad-hoc ISAs. The examples are there to demonstrate how to model specific architecture features such as SIMD, VLIW, floating point, multi-threading and many others. Synopsys is continuously extending its library of ASIP example models. For example, a wide scope of RISC-V ISA based models has been created and are frequently used by customers as a starting point to design proprietary ASIP accelerators. Using a RISC-V ISA baseline facilitates compatibility with and reuse of existing processor ecosystem elements.
In ASIP Designer, the Trv family of processors implements the RISC-V ISA. These models implement the base integer ISA and various ISA extensions. In this section we will elaborate on the members of the Trv family listed in Figure 1. All these variants are verified against reference implementations of the RISC-V ISA. They are fully supported by all components of the ASIP Designer tool suite, including C/C++ compilation, the generation of both cycle- and instruction-accurate simulation models (that can also be integrated in a virtual platform), RTL generation, and on-chip debugging.
Figure 1: Trv family of processor models
The integer models implement the RV32IM or RV64IM base integer instructions and multiplication extension. They come in versions with a 32- or a 64-bit wide data path and with a three or five stage protected pipeline. Multiplications are executed on a hardwired multiplier; divisions are executed on an iterative divider unit.
These models are optimized for area and clock frequency. Depending on the configuration and on the clock frequency, the gate count ranges from 28k to 40k gates for the 32-bit variants. For a 28nm technology, clock frequencies (Fmax) as high as 1.4 GHz can be achieved. Using the ASIP Designer compiler a performance of 3.3 CoreMark/MHz is reached.
The floating-point models add the F extension instructions to the 32-bit wide data path integer models, implementing the RV32IMF ISA. Additive and multiplicative instructions are executed on a hardwired fused multiply-add unit with a throughput of one operation per clock cycle. For the 5-stage version, this unit is pipelined. Other hardwired units implement compare, min/max and conversion instructions. The floating-point division and square-root instructions are executed on a shared iterative unit.
The gate counts for these models are in the range of 55k to 90k gates. The Fmax of the 5-stage variant is 1.2 GHz (28nm).
As mentioned, the purpose of the Trv models is to have a solid starting point for the development of an application-specific processor. ASIPs often target compute-intensive applications where arrays of data that are stored in memory must be processed. For these type of applications, we observe that the RISC-V ISA lacks the following important features:
To provide an improved starting point for compute-intense applications, we have developed models that extend the standard RISC-V ISA with these features. These models have an <x> suffix to their name (see Figure 1). Figure 2 shows the ISA that is supported by these models. It contains:
Figure 2: Instruction formats supported by Trv<x> processor models (visualization by ASIP Designer's nMLView tool)
The following code is generated by the ASIP Designer compiler for the inner loop of the CoreMark matrix multiplication code. Note that it contains an instruction that executes an addition and a load in parallel. The lh instruction uses post-modify addressing.
5680 00071781 00370732
add x14, x14, x3 | lh x15, 0(x14)
lh x16, 2(x9!)
mul x17, x16, x15
add x8, x8, x17
The Trv-SDX is an example processor model that implements the RISC-V ISA, and additionally contains templates for extension instructions. These templates are encoded using the RISC-V custom-2 opcode space, which has been reserved in the standard to enable custom ISA extensions. The Trv-SDX model was covered in detail in the October 2020 ASIP eUpdate.
Tmoby is an example of an application-specific processor that was designed starting from the Trv32p5 model (Figure 3). The objective was to design an accelerator for convolutional neural networks like MobileNet. We targeted medium throughput applications and decided upfront to allocate a vector data path with 64 MAC units, each capable of executing an 8x8 bit multiplication and 18-bit accumulation. The vector data path contains multiple vector register files: VEC stores a vector of 8 features, MAT stores 64 weights. The feature vector is replicated eight times and multiplied with the weights. The resulting product is added to the accumulator ACC. To sustain a throughput of one vector MAC per cycle, we need to load a new feature vector and a new weight vector each cycle. This is achieved by allocating two vector memories, VM and WM. In the ISA, we provide two loads as parallel operations to the vector MAC. The VLIW structure has a fourth slot, which hosts the RISC-V scalar instructions. With this architecture we can accelerate MobileNet V3 by a factor 360 compared to a scalar RISC-V core.
Figure 3: Tmoby ASIP architecture, with RISC-V scalar data-path (far left) and vector data-path extensions
In December 2021, we launched the latest feature release of ASIP Designer, providing various enhancements and extensions. The following is an extract, sorted by categories (customers can refer to the official Release Notes for a comprehensive list).
Note that the release schedule of ASIP Designer and ASIP Programmer™ has been modified. The current feature release is the first one under the new release schedule.
Designers can choose from an extensive library of example processor models provided as nML source code. In combination with ASIP Designer, these models can be used as a starting point for architectural exploration and customer-specific production designs. In the 2021.12 release there are two important updates for the Trv models:
ASIP Designer comes with a unique and patented compiler solution, with the compiler automatically retargeting itself to the processor architecture. This eliminates any need for compiler backend customization by the user. Release 2021.12 offers:
The efficient way to design, implement, program and verify your custom processor.Watch now
Extending RISC Processors into Flexible Accelerators for AI & Image Signal ProcessingLearn more
Access conference proceedings from leading university teamsDownload now
Deep dive into the concepts, languages, and files used to capture a processor design.Register now
We were on a tight schedule to develop five complex custom processor models for our multicore data flow processor. By using ASIP Designer and the RISC-V processor models provided with the tool as a starting point, we were able to meet functionality and performance requirements while reducing development time by 50%."
To meet our customer-specific requirements, we are developing specialized processors and programmable accelerators that are fully optimized for performance, power, area, and code size, while offering the required flexibility. Using ASIP Designer as our tool of choice gives us a significant competitive advantage, because it enables us to quickly develop complex and highly differentiated application-specific processors, while maximizing our design team’s efficiency through design automation and architecture exploration."
We were confident that Synopsys’ ASIP Designer tool would enable us to implement our specialized architecture within our aggressive project schedule. It allowed us to tune the instruction-set to run our specific algorithms 30 times faster than existing processors, which significantly reduces the calculation time needed to simulate important biomolecular interactions from a year to just a few weeks."
Dr. Makoto Taiji|