Michael Thompson, Sr. Product Marketing Manager, Synopsys
Michael Thompson, Sr. Product Marketing Manager, Synopsys
We are in the midst of a data explosion. Autonomous vehicles, augmented reality, machine vision, the internet, and personal assistants are all increasing rapidly in capability. The common link in these capabilities is the large amounts of data that they generate. Ten years ago, the world generated 2 ZB (Zettabytes) or 2x 1021 bytes of data. This year (2020) we will generate 32 ZB and by 2025 this will grow to more than 160 ZB. Most of this data is being generated outside of data centers and this is creating challenges in how and where we process and store this data. It requires energy, costs money, and takes bandwidth to move data, and for many applications there are latency limitations that prevent data from being moved very far. The sheer volume of the data means that we will not be able to move large portions of it for processing or storage, so it will have to be handled where it is generated.
The current era of technology development started with the personal computer in 1983 and moved to the internet era in 1995. In 2007 we saw the rise of the mobile era, which lead to the cloud era 2011. Each new technology era has been enabled by the previous eras. This is also true of the fifth era, the data era, that we are entering. The internet, mobile, and the cloud are all enabling the rapid growth of data. In 2007, at the start of the mobile era, we generated 0.5 ZB of data worldwide. By 2011 at the start of the cloud era this had increased 4x to 2 ZB. In the 9 years since data has increased more than 15x worldwide.
The rate at which we generate data is increasing exponentially with 90% of existing data created in the past two years. Currently seven billion people and businesses and 30 billion devices are connected to the internet. Every 60 seconds, one million of us log in to Facebook, we send 18 million text messages, we are watching 4.3 million videos on YouTube and we send close to 200 million emails. This is happening every minute of every day and this is just a small fraction of the data that we are generating.
There are several trends that are driving data growth. On the business side there is big data and real-time analytics, cloud computing, ecommerce, real-time inventory, and workforce automation. In the home, data growth is being driven by surveillance systems, home automation, wearables, streaming media, social media, personal assistants, and games. Of course, we are using many of these capabilities outside of the home, which is generating even more data. There are benefits to being connected in terms of productivity, security, convenience, and communication and the number of connections that we have are increasing driving data growth.
A second challenge is that most of the data increase is being generated by the Internet of Things (IoT) at the endpoints of the internet farthest from the cloud. This isn’t surprising because this is where we live and how we connect to things, but it creates challenges in terms of dealing with the data and making decisions based on the data. It is expensive in terms of cost and power to move data from where it is generated to the cloud, so much of the processing and storage will have to happen either where data is generated or at the network edge (Figure 1).
Another challenge is that some of the data must be processed in real-time. For many applications, the latency associated with moving data from an IoT device to the cloud for processing is too long. For example, autonomous vehicles will generate a lot of data from the sensors and cameras that they employ. Moving the data to the cloud to determine if there is a pedestrian in the path of the vehicle would result in disaster, so the data will have to be processed in the vehicle.
Many of the new capabilities of computational storage are being implemented with artificial intelligence (AI). AI has been around for many years, but it has only been in the past 10 years that we have been able to use this capability in embedded applications. Advancements in memory density, processor performance, and AI algorithms have all contributed to increases in performance and reductions in power consumption that have allowed AI to move from mainframes to embedded applications.
AI is an important solution that will enable us to store data more wisely based on how soon it will be needed and how often it will be updated. AI can be used to predict hot and cold data, to determine where to store data, to manage the lifecycle of data, and to uncover insights in stored data. In storage drives it can be used for object detection and classification or to create metadata (data about the data) to enable search.
AI does not need to stand alone at the edge. It can be used as a first level of processing for data at the edge and then the information moved to the cloud for any heavy lifting that needs to be done. This would reduce the amount of data transmitted, improve latencies, and offload processing from the cloud.
While processors, like the ARC HS and EM, can be used for AI tasks, specialized processors are available for specific AI tasks, and these offer the highest performance for embedded AI applications. For example, GPUs have been used for machine vision applications, but these are being replaced by newer specialized embedded vision (EV) processors, like Synopsys’ DesignWare ARC EV7x Processors. The ARC EV7x can be configured with a programmable neural network engine to perform AI operations at very high performance levels. Not only are processors increasing in performance but the AI algorithms that run on the processors are also being improved increasing accuracy and reducing memory requirements.