High-Performance Solutions for Next-Generation SSD Designs

Michael Thompson, Product Marketing Manager, Synopsys

We are entering a fifth wave of digital innovation that started 40 years ago. This began with the personal computer in the 1980s and was followed by the Internet in the 90s. In the 2000s, we entered the third wave with the rise of mobile computing devices, which was then followed by the growth of the cloud in 2011.

We now find ourselves entering the fifth wave with the rise of artificial intelligence (AI). Each of these waves was built upon and enabled by the previous waves, and underlying all the waves is a foundation of digital storage. Without digital storage none of the waves would have been possible, and each of these waves depended on advancements in storage. Digital storage gets taken for granted and it tends to operate out of sight. But advancements in digital storage have been nothing less than astounding. Today there is more storage capacity in the devices in our pockets than what existed in mainframe computers 30 years ago.  

Storage Markets

The fifth wave is also known as the data era where the amount of data that consumers are generating is quite staggering. And rightly so, because the amount of data generated and consumed in the fifth wave by AI, as it moves into the mainstream over the next few years, will be significantly larger than what we are generating today. While much of this data is in motion, much of it is being stored. The growth in data is being driven by social networking, e-commerce, search, content delivery, analytics, and big data. Every minute of every day 60 hours of video are uploaded to YouTube. This is 180 GB of data that must be stored every minute, and this is only one application.

In Data Centers and PCs hard disc drives (HDD) have been the dominant storage media, but this is changing as the cost of solid-state drives (SSD) based on NAND flash memory declines. SSDs still cost three to four times more per GB than HDDs, but the benefits of faster access and lower power consumption are enough to offset the cost difference. Also, the storage density of SSD is increasing faster than spinning media enabling much higher drive capacities. This is leading to a transition away from HDD to SSD, and it is forecast that spinning media will only account for 5% of the total disc storage market by 2025 as shown in Figure 1. This is a huge opportunity for companies that are poised to take advantage of this transition to flash storage and the growth in storage demand over the next few years.

Figure 1: Transition from HDD to SSD in enterprise storage

 

Storage Controller Trends

The design of storage controllers has changed significantly over the past ten years as drive sizes have increased. Ten years ago, most controllers had a single CPU and the main design concerns were data flow, the flash file system, wear leveling, garbage collection, error correction, and communication. This has changed as drive sizes have increased. These functions must still be done but they are becoming more challenging as new tasks are being added. 

For example, to increase the capacity of flash devices, designs moved to higher numbers of bits per flash cell, but this has caused the reliability and number of program/erase (PE) cycles to drop. This is a problem because the reliability and endurance expectations for the drives have not changed. At the same time, the throughput and IOPS requirements for flash drives increased significantly in just the past couple of years putting pressure on the storage controller that is managing the data flow. Storage controller developers are responding with multicore processor designs that support very high levels of performance and can be scaled to even higher levels by adding additional cores and clusters.

There are several new trends in SSDs that should be considered in a storage controller design. The challenges of moving Big Data and unacceptable latency of devices operating at the Edge are leading to an increasing use of compute in storage solutions. There is also a move to use AI in drives to perform specialized tasks such as object detection and classification, but also to increase drive endurance and reliability.  

Compute in Storage

Compute is moving to storage to deal with the long latencies that are occurring between where data is stored and where it is processed, and the large amounts of data that must be moved. Traditionally data has been moved from drives to the compute engine. This requires moving the data across backplanes, interfaces, and often across protocols. This not only takes time and increases latency, but it also burns power. In addition, it often results in the data being copied and existing in multiple locations.

Figure 2: Traditional versus In Storage Compute

By moving the computation capabilities into the drive, data movement outside of the drive is minimized (Figure 2). This reduces latency (often by orders of magnitude) and power consumption. It also increases security because the data is not moved outside of the drive. Another benefit is that the processing can be optimized for the workload. Unlike a general-purpose host, a storage compute processor can be specifically configured to support the needed functionality, in turn increasing processing efficiency.  

Artificial Intelligence

The use of AI in storage brings the potential to substantially increase endurance and reliability, while also supporting object detection, tracking, classification, and much more. The use of machine learning in flash drives is in the early stages and will be developed and implemented in many applications over the next three to five years. The potential benefits of using machine learning is significant, but comes with its own set of challenges. 

Using machine learning is not simple (Figure 3). The implementation of a machine learning framework requires a lot of preparation. For example, to improve drive endurance you need to determine the aspects of the NAND flash memory that effect endurance. Then through brute force testing, which often takes months, you need to create a graph that can be used to drive the target algorithm. The target algorithm must be implemented in hardware with the necessary firmware framework, so that the created graph can be programmed into it and run. 

Figure 3: Development and implementation of machine learning framework

AI is evolving rapidly with many new algorithms and offers huge potential, and the benefits in storage are just starting to be realized.

ARC Processors for Storage

ARC processors are widely used in flash storage controllers. They offer the highest performance-efficiency with a broad range of processor performance and capability. They support easy customization and the addition of proprietary hardware through APEX custom instructions. ARC processors reduce system latency and power consumption, and offer best in class code density, which is very important in many storage controller designs. 

The ARC HS family is the highest performance ARC processors delivering as much as 7500 DMIPS per core in 16nm processes. The HS family offers single-, dual-, and quad-core configurations with L1 cache coherency and L2 cache, a 40-bit physical address space, and support for Linux and SMP Linux. The HS family was designed with heavy consultation with our storage customers and has features like 64-bit per clock loads and stores that accelerate data movement. 

The ARC EM family has the same programmer’s model and instruction set as HS cores making it easier for designers to use both cores in a design, and for the firmware team to partition their code across the processors. The family is designed for low power and small size and offers a broad range of capabilities while consuming as little as 2 µW/MHz  power and taking less than one hundredth of a square millimeter of silicon. Even so, the ARC EM cores offer excellent performance delivering up to 1.8 DMIPS/MHz. 

All ARC HS and EM cores support APEX custom instruction extensions that enable users to add their own proprietary hardware to the processor to increase performance, reduce power consumption or add functionality. Synopsys offers cryptographic options for the EM cores that include common crypto algorithms. These are area optimized and offer high-performance for storage applications that require encryption.

The ARC EV processors offer a fully programmable and scalable solution for AI. They are implemented with a high-performance scalar core coupled with a 512-bit wide vector DSP. A broad range of machine learning algorithms and software models can be programmed and supported by the EV family processors. They are available in single-, dual-, and quad-core implementations supporting extremely high levels of machine learning performance. If implementation of object detection and classification is desired in the drive, an optional Convolution Neural Network (CNN) engine supports HD video streams.

Synopsys offers a complete development environment for the ARC processors. This includes the MetaWare Development Toolkit with compiler, debugger, and simulator optimized specifically for ARC processors. There are also several other simulators that support fast simulation and cycle accurate simulation. There are many development boards available as well as support for the Linux and MQX operating systems.

Summary

The flash storage market is going to grow at a rapid rate over the next 5-10 years, and offers a big opportunity for companies that are poised to take advantage of it. Using the best processor for the storage controller is critical in the design of flash storage products. ARC processors offer an excellent solution for flash storage and are widely used in storage applications. ARC processors offer the industry’s highest performance-efficiency and features to accelerate data movement reducing latency and increasing IOPS and throughput. The DesignWare ARC processors enable engineers to create state-of-the-art SSD designs that are adaptable, scalable, and customizable to deliver optimal performance across the full range of flash storage application requirements. 

 

For more information: