

# Optimize High Performance Processor Implementation with AI-enabled Fusion QuickStart Kit

John Moors and Frank Gover Synopsys ARC<sup>®</sup> Processor Summit 2022

# **CONFIDENTIAL INFORMATION**

The information contained in this presentation is the confidential and proprietary information of Synopsys. You are not permitted to disseminate or use any of the information provided to you in this presentation outside of Synopsys without prior written authorization.

# **IMPORTANT NOTICE**

In the event information in this presentation reflects Synopsys' future plans, such plans are as of the date of this presentation and are subject to change. Synopsys is not obligated to update this presentation or develop the products with the features and functionality discussed in this presentation. Additionally, Synopsys' services and products may only be offered and purchased pursuant to an authorized quote and purchase order or a mutually agreed upon written contract with Synopsys.



# Agenda

- Time To Market (TTM) Challenges
- ARC processor family
- HS68 Processor configuration and objectives
- Overall flow
- Fusion QuickStart Implementation Kit (QIK)
- Al-driven implementation with DSO.ai
- Results
- Summary



# Time To Market (TTM) Challenges Addressed

**RDF** – Reference Design Flow QIK - QuickStart Implementation Kit DSO – Design Space Optimization



## **One Solution**

• Combining the Best in Class

-IP

-Libraries

-Tools

-Methodologies

• Services

• Support





# DesignWare ARC Processor IP

Unrivaled Efficiency for Embedded Applications



- Integrated hardware safety features for ARC EM, SEM, HS, VPX, EV and NPX processor families
  - Accelerates ISO 26262 certification for safety-critical automotive SoCs



### ARC HS6x 64-bit Processor IP

Implements the ARCv3-ARC64 ISA: Optimized for High-end Embedded



Processor Summit

# HS68 Design Overview

| Feature           | Value                                                                   |
|-------------------|-------------------------------------------------------------------------|
| RTL Configuration | Single core 64-bit ARCv3 CPU with MMU                                   |
| Frequency Goal    | 2.8 GHz                                                                 |
| Power Goal        | Leakage: 10 mW Total: 87 mW/GHz                                         |
| Area              | 0.13 <i>mm</i> <sup>2</sup>                                             |
| Technology        | TSMC-5FF                                                                |
| Cell Height       | H280                                                                    |
| Metal Stack       | Process : 1P15M<br>Metal Option: 1X1Xb1Xe1Ya1Yb5Y2Yy2Z                  |
| PVTs              | Setup : SSGNP0P675VN40C<br>Hold : FFGNP0P825V125C<br>Power : TT0P75V25C |
| Library           | Synopsys_TSMC_5nm_FF_HS_SVT_LVT_ELVT_hpc                                |
| Multivoltage      | Single voltage, single supply                                           |
| SAIF              | dhry_pwr                                                                |

#### Main Objectives:

1) Frequency improvement.

2) Power (dynamic and leakage) vs frequency trade-off.

3) Floorplan and area with frequency and power in mind.





### **Overall Flow** RDF – QIK – DSO.ai



RDF – Reference Design Flow QIK – QuickStart Implementation Kit DSO – Design Space Optimization



# ARC Reference Design Flow (RDF)

Features

- Complete RTL to GDS implementation flow
- Supporting all ARC CPU families (EM/HS/VPX/EV/NPX)
- Configures tailored makefiles and flow scripts during IP configuration
- Delivered for free with ARC CPU IP
   libraries



# ARC Reference Design Flow (RDF)

Use cases

Used during early customer engagements (even on



- Used to help customers in hardening the IP and debugging
- Used during pre-sales PPA benchmarking
- Used intensively during IP development to regress RTL implementation feasibility and PPA
- Used during product release to capture off the shelf PPA across many IP templates & technologies



# Fusion QuickStart Implementation Kit (QIK)

Processor Summ

Customized Implementation Flow to Speed-Up TTM

- Implementation kit to quickly achieve high PPA objectives.
- Starts with proven Reference Methodology (RM) flow validated for tool versions and technologies
- Based on implementation objectives a flow is selected (Flat or hierarchical)
- Library and technology customizations are added to the flow. (e.g., layer assign, NDRs, cells, etc.)
- Design customizations added to the flow. (e.g., bounds, placement attractions, path margins, etc.)
- High Performance Core (HPC) Switch Highly customized flow features for IP family. (e.g., new tool features, technology dependent settings, etc.)





#### **Synopsys**®



Processor Summi

It's a journey: Evolving and enhancing core optimization technologies



#### **Recent Fusion Compiler Collaboration Technologies**

Key Technologies for Achieving Timing/Power Targets on Processors



## QIK for HS68 – Implementation, Analysis And Signoff



# DSO.ai: Al-driven Design Space Optimization

Uses machine-learning to navigate the combined design-technology solution space



- Breakthrough reinforcement learning engine
   Capable of exploring trillions of design recipes
- Multi-objective design space optimization
   High-quality results from feasibility to closure
- Fully-integrated with Fusion Design Platform
   Fast ramp-up through industry's richest technology foundation
- Cloud-ready for fast deployment
  - Supports on-prem, public, and hybrid clouds



# DSO.ai: Design Space Optimization Loop

Uses reinforcement-learning to navigate the design-technology solution space



Processor Summit



**Synopsys**°

# Al-driven Design Space Optimization

Example : Exploring library cell parameters for better power

TNS vs. Leakage



TNS





#### **Problem Statement:**

Achieve lowest power while maintaining TNS <-100ns

#### **DSO Parameter Space**

- Design, tool, flow parameters
- Library cell parameters

#### **Objectives (prioritized)**

- Leakage
- TNS
- Secondary (DRC, etc)

# DSO.ai – Applying Cross-Design Learning for Project Reuse

Boost Design Team Productivity, Improve Compute Efficiency



# QIK + DSO.ai – Improving Time To Market

- After the QIK flow was established DSO.ai was used to look for optimal results in the solution space
- Cold start
  - Seed (default and user permutons) provided
  - 50 workers used. Best 10 results, chosen by DSO.ai (ADES), passed to next step
- Warm Start
  - Warm start database used. No seed required. DSO.ai engine selects permutons
  - 30 workers used. Best 10 results, chosen by DSO.ai (ADES), passed to next step
- Aggregate Design Score (ADES) = WNS, TNS and power



### Final Results - Frequency

23.0 % Frequency improvement with the combination of QIK and DSO.ai



### QIK Flow – 2.69 GHz

Satur

| aseline Flow – 2.30 GHz |                |         |          |      |        |              |         | 1-  | × | Stage          | Setup  |      |      |        |        |        |
|-------------------------|----------------|---------|----------|------|--------|--------------|---------|-----|---|----------------|--------|------|------|--------|--------|--------|
| 12                      |                | ;       |          | / —  | - 2.   | 30           | GI      | ובו |   |                | WNS    | TNS  | NVE  | r2rWNS | r2rTNS | r2rNVE |
| X Stage Setup           |                |         |          |      | •      | Run: Run1 (B | ASELINE | )   |   |                |        |      |      |        |        |        |
|                         |                | WNS     | TNS      | NVE  | r2rWNS | r2rTNS       | r2rNVE  |     |   | compile        | -0.010 | -0.2 | 143  | -0.010 | -0.2   | 134    |
|                         | Run: Run1 (B   | ASELINE | )        |      | 1      |              |         |     |   | clock_opt_opto | -0.025 | -4.2 | 1076 | -0.025 | -4.0   | 966    |
| -<br>-                  | ```            |         | <b>′</b> | 0007 |        | 50.0         | 00.45   |     |   | route_opt      | -0.024 | -3.3 | 1047 | -0.024 | -3.1   | 954    |
|                         | compile        | -0.079  |          | 3087 |        |              |         |     |   | endpoint_opt   | -0.015 | -0.5 | 134  | -0.015 | -0.5   | 125    |
|                         | clock_opt_opto | -0.138  | -78.3    | 5434 | -0.083 | -54.8        | 5022    | ŀ   |   |                |        |      |      |        |        |        |

X Store

### DSO.ai – 2.83 GHz

| ×                      | Stage          | Setup  |      |     |        |        |        |  |  |
|------------------------|----------------|--------|------|-----|--------|--------|--------|--|--|
|                        |                | WNS    | TNS  | NVE | r2rWNS | r2rTNS | r2rNVE |  |  |
| ▼ Run: Run1 (BASELINE) |                |        |      |     |        |        |        |  |  |
|                        | compile        | -0.003 | -0.0 | 69  | -0.003 | -0.0   | 64     |  |  |
|                        | clock_opt_opto | -0.021 | -1.7 | 347 | -0.021 | -1.5   | 312    |  |  |
|                        | route_opt      | -0.018 | -2.6 | 810 | -0.018 | -2.6   | 783    |  |  |
|                        | endpoint_opt   | -0.009 | -0.4 | 158 | -0.009 | -0.4   | 152    |  |  |



route\_opt

-0.106

-33.7 2125

-0.078

-18.5

1822

### **Final Results - Power**

44.1 % Total power improvement with the combination of QIK and DSO.ai



### Baseline Flow – 2.30 GHz

| × Stage Power          |                |           |                      |      |           |  |  |  |
|------------------------|----------------|-----------|----------------------|------|-----------|--|--|--|
|                        |                | TotalPwr  | TotalPwr LeakPwr Gat |      | Bits/Flop |  |  |  |
| ▼ Run: Run1 (BASELINE) |                |           |                      |      |           |  |  |  |
|                        | compile        | 238460000 | 10360000             | 99.7 | 1.00      |  |  |  |
|                        | clock_opt_opto | 246700000 | 10040000             | 99.7 | 1.00      |  |  |  |
|                        | route_opt      | 244155000 | 9965000              | 99.7 | 1.00      |  |  |  |

### QIK Flow – 2.69 GHz

| × | Stage          | Power     |         |      |        |           |  |  |  |
|---|----------------|-----------|---------|------|--------|-----------|--|--|--|
|   |                | TotalPwr  | LeakPwr | LVt% | Gated% | Bits/Flop |  |  |  |
| • | Run: Run1 (B   | ASELINE)  |         |      |        |           |  |  |  |
|   | compile        | 91151000  | 8341000 | 56.6 | 99.7   | 1.00      |  |  |  |
|   | clock_opt_opto | 129647000 | 9457000 | 59.8 | 99.7   | 1.00      |  |  |  |
|   | route_opt      | 124574000 | 9084000 | 51.9 | 99.7   | 1.00      |  |  |  |
|   | endpoint_opt   | 124585000 | 9085000 |      | 99.7   | 1.00      |  |  |  |

### DSO.ai – 2.83 GHz

| × | Stage          | Power     |          |      |        |           |  |  |  |
|---|----------------|-----------|----------|------|--------|-----------|--|--|--|
|   |                | TotalPwr  | LeakPwr  | LVt% | Gated% | Bits/Flop |  |  |  |
| - | Run: Run1 (B   | ASELINE)  |          |      |        |           |  |  |  |
|   | compile        | 135828000 | 9608000  | 64.8 | 99.7   | 1.00      |  |  |  |
|   | clock_opt_opto | 141340000 | 10270000 | 65.0 | 99.7   | 1.00      |  |  |  |
|   | route_opt      | 136400000 | 10500000 | 64.3 | 99.7   | 1.00      |  |  |  |
|   | endpoint_opt   | 136470000 | 10510000 |      | 99.7   | 1.00      |  |  |  |



# QIK + DSO.ai – Power vs. Frequency

Easy Tradeoff consideration: 90 MHz gain with an increase of 0.6 mW

- Design teams may have stopped at 2.77GHz (30MHz gain) when trying to make the trade-off between power and frequency.
- DSO.ai can easily look at a bigger solution space and found a 90MHz gain for the same increase in power.





### **QIK Area Improvements**

3.9 % Area reduction

Baseline Floorplan (375.009 x 343.392) Area = 0.129  $mm^2$ 



QIK Floorplan (377.859 x 329.392) Area = 0. 124  $mm^2$ 





# Summary

Improving Time To Market (TTM)

- HS68 Improved frequency to 2.83GHz QIK (390 MHz) and QIK + DSO.ai (530 MHz), while reducing total power, maintaining leakage power and reducing design area.
- Time To Market improvements with QIK+DSO.ai
- Combining the Best in Class

   IP, Libraries, Tools and Methodologies
- HS5x/HS6x QIK (Flow, cookbook and reference guide) is ready to be downloaded from the Synopsys external website.









# Q & A



# Thank You