The explosive growth of AI and high-performance computing (HPC) workloads—driven by generative AI, real-time analytics, and advanced cloud services—has redefined the requirements for storage infrastructure. Today’s data-centric applications demand ultra-fast, low-latency, and highly scalable storage solutions. NVMe (Non-Volatile Memory Express) continues to evolve, with the NVMe 2.1 specification introducing advanced command sets such as Computational Programs and Subsystem Local Memory. These enhancements transform NVMe from a traditional storage protocol into a compute-enabled platform, unlocking new performance tiers for AI and HPC. However, with this evolution comes increased complexity in the verification of NVMe 2.1 designs, making robust validation essential for delivering reliable, high-throughput systems.
Meeting the demands of AI and HPC means ensuring that every layer of the storage stackfrom host to controller can handle parallel processing and enormous data flows with minimal latency. NVMe 2.1’s Computational Programs and Subsystem Local Memory command sets are central to this transformation.
Subsystem Local Memory (SLM) introduces a new type of namespace—Memory Namespace—which is byte-addressable, unlike traditional LBA-based namespaces such as NVM (Non-Volatile Memory), ZNS (Zoned Namespaces), and KV (Key Value). This allows the Host to directly access memory within the NVMe subsystem, unlocking new possibilities for data-intensive applications.
Additionally, SLM (Subsystem Local Memory) command set gives memory access/space within the namespaces for the NVMe Controller to perform computations as a part of Computational Command Set operations.
Computational Command Set introduces a new set of namespaces, i.e. Compute Namespaces, which doesn’t hold any memory, but utilizes the memory space provided by memory namespaces through the memory range sets. Computational Commands can also utilize NVM namespaces for computation purposes. These new command sets enable systems with high performance and reduced latency.
In this model, the Computational Program Command Set leverages compute namespaces to execute tasks using memory ranges defined by the Subsystem Local Memory. This dependency highlights how memory access must be established before computation can occur. Additionally, other command sets facilitate data movement between input/output namespaces, ensuring efficient execution within the NVMe subsystem.
By bringing compute closer to storage, these capabilities reduce data movement, lowers latency, and boosts system efficiency. Together, these command sets empower NVMe devices to offload compute tasks from Hosts, enabling faster software execution and supporting use cases like real-time analytics and security operations.
Figure 1: NVMe Computational Programs and Subsystem Local Memory Command Set
These advancements introduce intricate verification challenges:
Synopsys VIP provides easy-to-adapt solutions with comprehensive features specifically designed to address these advanced NVMe command set verification challenges, enabling customers to achieve thorough design validation within stipulated timelines.
Synopsys NVMe VIP addresses these challenges with:
Figure 2: NVMe VIP Architecture
Detailed simulation logs and debug transcripts that provide clear visibility across Host, Adapter, and Controller layers, enabling faster issue resolution and strong verification coverage for complex NVMe command sets.
Figure 3: Simulation Generated Transcript
As NVMe 2.1 brings computational capabilities closer to the data, design verification becomes a mission-critical differentiator. Synopsys NVMe VIP equips engineering teams with the tools to confidently validate complex command sets, accelerate time-to-market, and deliver the high-performance, reliable storage demanded by AI and HPC applications.
Synopsys is partnering with early customers and collaborators to enhance the standard architecture for their next-generation designs, incorporating new features now available with the latest specifications.
Synopsys VIP is natively integrated with the Synopsys Verdi® Protocol Analyzer debug solution as well as Synopsys Verdi® Performance Analyzer. Running system-level payload on SoCs requires a faster hardware-based pre-silicon solution. Synopsys transactors, memory models, hybrid and virtual solutions based on Synopsys IP enable various verification and validation use-cases on the industry’s fastest verification hardware, Synopsys ZeBu® emulation and Synopsys HAPS® prototyping systems.