Insight Home | Previous Article | Next Article
Issue 3, 2013
DesignWare ARC nSIM: Speed, Accuracy and Visibility – Instruction Set Simulation without Compromise!
Instruction set simulators (ISS) are vital tools for compiler, operating system and application development as well as processor architecture design space exploration and verification. Because the demands are so different, designing an ISS that caters to all of these application scenarios is a significant challenge. Hardware verification demands absolute precision with respect to architectural behavior, even for corner case, randomly generated scenarios that are unlikely to occur in reality. Conversely, compiler developers require functional correctness, performance and rich profiling feedback to create an optimizing compiler before the actual hardware is ready.
In this context, it is easy to settle for a compromise, trading architectural accuracy for performance. However, Synopsys provides one product that can do it all without compromise: ARC nSIM. In this article, Igor Böhm, R&D engineer, Synopsys, discuss the simulation performance and architectural accuracy that are achievable using ARC nSIM. Just-In-Time (JIT) compilation technology makes it possible to achieve the highest possible simulation performance where simulation speed can, in some cases, exceed the speed of final silicon. Additionally, the way in which nSIM integrates into the RTL verification methodology not only speeds up the internal verification process, but also guarantees architectural correctness at the simulation level.
nSIM Pro – Turbocharging Simulation Performance
Dynamic compilation, also referred to as Just-In-Time (JIT) compilation, is the key technology to speed up program simulation at runtime. The main idea behind dynamic compilation is to defer machine-specific code generation and optimization until runtime when additional profiling information is available. Some optimizations that are critical for high-performance simulation speeds are virtually impossible to apply without dynamic runtime information. The rule of thumb is that a dynamic compiler will yield roughly up to a 10x speedup when compared to interpretive simulation performance (Figure 1).
Figure 1: Speed-ups for BioPerf benchmark suite comparing (a) nSIM base interpreted only simulation mode, (b) nSIM Pro simulation using a single concurrent dynamic compiler, and (c) simulation using nSIM Pro’s novel, concurrent and parallel dynamic compiler.
Dynamic compilation occurring at runtime inevitably incurs an overhead and thus contributes to the total execution time of a program. There is a trade-off between the time spent for dynamic compilation and total execution time. If, on the one hand, lots of effort is spent on aggressive dynamic compilation to ensure generation of highly efficient native code, too much compilation time will be contributed to the total simulation time of a program. On the other hand, if too little time is spent on optimizing code for execution during dynamic compilation, the runtime performance of the simulated program is likely to be suboptimal. Three key innovative techniques behind nSIM Pro’s dynamic compilation infrastructure aim to reduce dynamic compilation latency, thereby speeding up simulation:
- Adaptively prioritizing the most recently executed program regions
- Compiling these regions in parallel
- Compiling concurrently with the simulation of the target program
During simulation, we only want to invest dynamic compilation effort in program regions that are executed frequently, but what “executed frequently” means depends on the application. Traditionally, dynamic compilation systems used an empirically determined threshold or left it up to the user to select a threshold. The problem is that we do not know which programs users are going to run and we can’t expect users to waste time figuring out what the right threshold would be for their application. Therefore, nSIM removes this burden by automatically adapting its program hotspot selection strategy. That means it is guaranteed to yield the best performance for small embedded benchmarks, such as EEMBC CoreMark, as well as large benchmarks, such as the simulation of the GCC C compiler included in the SPEC CPU 2006 benchmarks.
To effectively reduce dynamic compilation latency, a dynamic compilation system must improve its workload throughput (i.e. compile more application hotspots per unit of time). In order to do this, nSIM analyzes profiled code and JIT compile independent translation units in parallel.
Concurrent Dynamic Compilation
Furthermore, nSIM does not pause simulation to wait until the JIT compiler has finished generating code for a particular program hotspot. Instead, nSIM continues interpretive simulation concurrently with dynamic compilation, further reducing dynamic compilation overheads. For the user, this means there are no unpleasant pause times, and simulated applications are quick and responsive. This is very important when simulating applications that require real-time simulation performance or that have user interaction, as is necessary with full system OS simulation.
A novel way to dynamically discover and select program regions coupled with parallel and concurrent dynamic compilation makes nSIM a truly scalable simulator, capable of automatically adapting to changing workloads, efficiently exploiting the parallelism and concurrency available on contemporary multi-core simulation hosts. Figure 1 clearly demonstrates the speedups achievable by the concurrent and parallel dynamic compiler built into nSIM Pro when compared to nSIM with only an interpretive simulator.
nSIM Behaves Like Real Hardware
It is extremely important that a processor simulator behaves like the final hardware product to avoid unpleasant and costly surprises late in the development cycle. nSIM relies on a verification methodology that deeply integrates it into the hardware verification process as “The Golden Master Model” to ensure that it precisely matches the RTL.
Every new instruction set architecture (ISA) feature is implemented in RTL and nSIM based on the description present in the programmer’s reference manual. Finally, during verification, both RTL and nSIM are executed in lock-step, comparing detailed architectural and micro-architectural states of the RTL and nSIM models after each step. This verification methodology is also commonly referred to as online or co-simulation verification.
Every second, nSIM runs thousands of randomly generated and directed tests using co-simulation to make sure it is in sync with the RTL and behaving correctly. Compared to an offline verification strategy, online verification significantly speeds up the verification process as it can pinpoint errors instantly and precisely. There is no need to perform time-consuming, post-mortem analysis of trace files that can never carry as much state information as is available at runtime. In addition, due to the fact that offline verification relies on the presence of instruction traces, it is bound by very real limits such as file size and file storage. Co-simulation does not suffer from this problem and can easily stress test RTL and nSIM by simulating billions of instructions and yielding better test coverage for both the RTL and nSIM.
Because of the deep integration of nSIM into the RTL verification process, nSIM users can rely on the fact that the programs they develop using the nSIM simulator will behave the same on the final hardware. This is true even for the most obscure corner cases that most users will likely never run into, but which must be tested and verified during RTL development.
About the Author
Igor Böhm is an R&D engineer at Synopsys and a technical lead for the ARC nSIM simulator product. As a participant in the Processor Automated Synthesis by iTerative Analysis (PASTA) Project, a research group at Edinburgh University, Igor pioneered and developed a dynamic compilation infrastructure for an instruction set simulator that has the capability to simulate at faster than silicon speeds on commodity hardware. This technology has been licensed by Synopsys and is the basis for the ARC simulators. Igor holds a PhD degree from the Institute for Computing Systems Architecture at the University of Edinburgh in UK, and an MS degree from the Institute for Systems Software at the Kepler University of Linz, Austria.