Softening Hardware: Using Application-Specific Processors to Optimize Modern SoC Designs

Markus Willems, Sr. Manager Product Marketing, Synopsys

System-on-chip (SoC) designers are implementing an increasing amount of functionality in software, to gain flexibility, mitigate against the uncertainty of supporting evolving standards, and to make it possible for a single chip to serve many end products.

Moving functionality from hardware to software running on a processor comes at a potential cost of lower performance and higher power consumption than optimized hardware. One way to tackle this issue is to use a set of small, task-specific processors instead of one, large general-purpose processor. This approach is used in applications such as image recognition, database queries, and machine learning. Figure 1 shows how mobile SoCs deploy a variety of specialized processors to achieve high system performance while controlling system power.

Figure 1: Specialized processors working together in a mobile SoC 

Many SoC developers turn to IP houses for such specialized processors. However, if the IP providers do not yet offer such processors for an emerging application, SoC designers have traditionally had to choose between using an existing processor that is “close enough” to the required functionality or reverting to the inflexibility of fixed-function hardware. A third option is to build an application-specific instruction-set processor (ASIP), whose instruction set (ISA) and micro-architecture is tailored to the needs of the target application. The decision to develop an ASIP in-house or use off-the-shelf processor IP is a trade-off between the differentiating advantage the ASIP would bring, and the engineering effort required to design, optimize, verify and program it for the end application. This, in turn, is predicated upon the effort needed to build a reasonably capable toolchain for programming the ASIP.

Architectural Options

An ASIP is a combination of a software-programmable processor and application-specific functional units, optimized for a set of functions. ASIP designers use parallelism and specialization to achieve this optimization, while trying to retain full C programmability.

Parallelism enables designs to run multiple functions at once, and its three main forms can be applied individually or in combination to boost performance. The options are laid out in Figure 2 and described here.

Instruction-level parallelism uses an orthogonal instruction set, as in very long instruction word (VLIW) architectures, or an encoded instruction set (which delivers the operational parallelism needed without the overhead associated with VLIW architectures). 

Data-level parallelism implements vector processing, which involves applying one instruction to multiple data items.

Task-level parallelism, as in multicore/multi-threading implementations, enables multiple cooperating algorithms that have different control flows to run alongside each other.

Figure 2: Design Options—Parallelism

Specialization enables designers to perform functions with one or a few instructions, by customizing their processor’s pipelines, internal/external memory, register architectures, and connectivity. Designers can also define application-specific data types and interfaces. Figure 3 depicts various forms of specialization.

Figure 3: Design Options—Specialization

Factors to Consider when Developing ASIPs

An ASIP is only worth developing if it can bring useful differentiating advantage within that design’s market window. Designers therefore need to be able to rapidly explore the impact of architectural choices upon their ASIP, by doing three things:

1.      Define benchmarks representative of the end application, to enable a quantitative comparison of the architectures being considered. A benchmark must have:

  • A functional specification, describing the application kernels that need to be implemented. Benchmarks are usually represented in C for ease of implementation and architectural independence.
  • An environment describing the stimuli that exercise the benchmark.
  • Performance metrics such as power, performance, and target frequency.

2.      Describe a candidate architecture

Designers need a quick and easy way to define a candidate architecture, ideally using a modelling approach that avoids specifying deep implementation details early in the design process.

Designers also need software tools to map benchmark code onto the candidate architectures, and, since it is impractical to develop a new toolchain for each candidate architecture manually, this needs to be automated.

3.      Explore the design space

Design exploration involves evaluating each candidate architecture against the defined benchmarks. There are two main concepts to be applied to this.

Compiler-in-the-loop: Designers need to use a C/C++ compiler to run benchmarks onto each candidate architecture, rather than trying to use time-consuming and error-prone assembly language. It’s also mandatory to have a cycle-accurate instruction set simulator (ISS) and a profiler to analyze results. The C/C++ compiler, ISS and profiler can be combined with a debugger, assembler, and linker to form a full software development toolkit (SDK). 

The SDK should be available early in the design process and quickly retargetable to the various architectural alternatives, to enable efficient design-space exploration.

Synthesis-in-the-loop: It can be useful to quickly analyze the hardware cost and characteristics of a candidate architecture in terms of its operating frequency, area, and power efficiency. To do this, there should be a way to automatically generate synthesizable RTL, and then use synthesis tools to analyze the hardware characteristics of each candidate architecture.

ASIP Designer

Synopsys’ ASIP Designer accelerates the creation of ASIPs. It offers retargetable compilation and architectural exploration technology, fast simulation, and integration with implementation flows.

Figure 4 shows how ASIP development is supported within ASIP Designer, and the way in which it integrates into the Synopsys design and verification flows. 

Figure 4: ASIP Designer Tool Flow

Processor Modeling

The ASIP is described using nML, a structured architecture description language that efficiently and concisely describes processor architectures at the same level of abstraction as a programmer’s manual. The language is used to define the structural characteristics of the design (registers, functional units, signals, etc.) and the instruction-set architecture. nML also enables users to describe the cycle- and bit-accurate behavior of the datapaths and I/O interfaces.

SDK Generation

In ASIP Designer, an ASIP’s nML description is used as an input to the retargetable SDK (step 1 in Figure 4), which automatically adapts to the defined processor architecture. The SDK includes an optimized C/C++ compiler, assembler/disassembler, linker, cycle-accurate as well as instruction-accurate instruction-set simulator, and the graphical debugger shown in Figure 5.

Figure 5: Components of the software development kit

It’s possible for the compiler to adapt to the detail of each candidate’s architecture because its compiler optimizations are implemented in a generic way. Other compiler frameworks, such as GNU or LLVM, need an architecture-specific compiler backend for each candidate. The immediate availability of a compiler enables rapid architectural exploration and iteration (see step 2 in Figure 4).

Having a compiler in the loop also means software authors can provide feedback to the ASIP designer, and that the processor’s dynamic performance can be studied and optimized. Making these kinds of adaptations and trade-offs at this level of abstraction is much more efficient than trying to do it once an RTL description has been generated.

The SDK also makes it possible for end users to program the ASIP once it is implemented in an SoC. The architecture-specific SDK can be made available as a standalone package for such customers.

Hardware Generation and Verification

Once the design meets its functional requirements, ASIP Designer integrates with Synopsys implementation and verification tools to take the design from its RTL description to tape-out.

ASIP Designer will translate the nML model into fully synthesizable Verilog or VHDL (see step 3 in Figure 4), with full cycle-accurate and bit-accurate control of the hardware. Design and verification tools, such as Synopsys’ Design Compiler and VCS, can then be used to implement the ASIP. For example, Design Compiler can be used to generate a gate-level description that can be used to predict the circuit’s power requirement and area or use place-and route tools such as IC Compiler to explore the risk of routing congestion.

This “synthesis-in-the-loop” approach enables educated decisions and avoids surprises later in the design process. Should the design face problems during implementation, developers can go back to the nML description and adjust it to address the issue. Because of the single-source entry in nML, the SDK and RTL will remain in sync.

Verifying the ASIP

There are two aspects to verifying the ASIP.

The first is to verify that the processor model, as described in nML, acts as intended. ASIP Designer helps with this by supporting: confirmation of correct test-case execution as compared to native execution on the designer's workstation, automatic consistency checks, diagnostic reports analyzing connectivity, hardware conflicts, unused instructions, pipeline hazards, automatic generation of processor specific “one-liner” C programs that check if all units necessary for a compiler are present, and other diagnostics.

The second key aspect is verification of the RTL model, ensuring that the generated RTL implements the processor model correctly. ASIP Designer supports: automatic generation of directed random instruction sequences as assembly code, templates of instruction sequences by generating random values for all required fields, automatic generation of coverage points, and many more. These are provided in SystemVerilog and can be integrated into an overall testbench.

The effort spent on verifying an ASIP depends on how it will be used. The more narrowly defined the ASIP is, the closer its functional verification will resemble that of a fixed-function RTL implementation. The more generic the ASIP, the closer functional verification will become to the effort that a processor IP provider has to make to ensure proper function of its IP in almost any use case.

Conclusion

ASIPs provide a useful mix of hardware efficiency and software flexibility – if they can be designed, verified and programmed quickly enough to meet project requirements.

ASIP Designer can help with this by enabling detailed architectural exploration, compiler- and synthesis-in-the-loop analysis, and straightforward programming through the automated generation of an optimized SDK. For more information visit Application Specific Instruction-Set Processors page.