Using Artificial Intelligence to Harness the Coming Data Explosion

Michael Thompson, Sr. Product Marketing Manager, Synopsys

Introduction

We are in the midst of a data explosion. Autonomous vehicles, augmented reality, machine vision, the internet, and personal assistants are all increasing rapidly in capability. The common link in these capabilities is the large amounts of data that they generate. Ten years ago, the world generated 2 ZB (Zettabytes) or 2x 1021 bytes of data. This year (2020) we will generate 32 ZB and by 2025 this will grow to more than 160 ZB. Most of this data is being generated outside of data centers and this is creating challenges in how and where we process and store this data. It requires energy, costs money, and takes bandwidth to move data, and for many applications there are latency limitations that prevent data from being moved very far. The sheer volume of the data means that we will not be able to move large portions of it for processing or storage, so it will have to be handled where it is generated.

The Coming Data Explosion

The current era of technology development started with the personal computer in 1983 and moved to the internet era in 1995. In 2007 we saw the rise of the mobile era, which lead to the cloud era 2011. Each new technology era has been enabled by the previous eras. This is also true of the fifth era, the data era, that we are entering. The internet, mobile, and the cloud are all enabling the rapid growth of data. In 2007, at the start of the mobile era, we generated 0.5 ZB of data worldwide. By 2011 at the start of the cloud era this had increased 4x to 2 ZB. In the 9 years since data has increased more than 15x worldwide.

The rate at which we generate data is increasing exponentially with 90% of existing data created in the past two years.  Currently seven billion people and businesses and 30 billion devices are connected to the internet. Every 60 seconds, one million of us log in to Facebook, we send 18 million text messages, we are watching 4.3 million videos on YouTube and we send close to 200 million emails.  This is happening every minute of every day and this is just a small fraction of the data that we are generating.  

There are several trends that are driving data growth. On the business side there is big data and real-time analytics, cloud computing, ecommerce, real-time inventory, and workforce automation. In the home, data growth is being driven by surveillance systems, home automation, wearables, streaming media, social media, personal assistants, and games. Of course, we are using many of these capabilities outside of the home, which is generating even more data. There are benefits to being connected in terms of productivity, security, convenience, and communication and the number of connections that we have are increasing driving data growth.

Data Challenges

All this data has its benefits, but it also presents us with several challenges. First, data is growing exponentially. This means that the rate of increase is increasing, which is a challenge because what we have today isn’t enough and it takes years to develop significantly higher bandwidth solutions. Between 2019 and 2025 our data usage will increase by 5x or 130 ZB more in 2025 than in 2019. To put this in perspective 130 ZB is more than all of the data that we generated from when semiconductors were invented through 2019.

Figure 1. Connected devices, edge computing, and the Internet of Things

A second challenge is that most of the data increase is being generated by the Internet of Things (IoT) at the endpoints of the internet farthest from the cloud. This isn’t surprising because this is where we live and how we connect to things, but it creates challenges in terms of dealing with the data and making decisions based on the data. It is expensive in terms of cost and power to move data from where it is generated to the cloud, so much of the processing and storage will have to happen either where data is generated or at the network edge (Figure 1).

Another challenge is that some of the data must be processed in real-time. For many applications, the latency associated with moving data from an IoT device to the cloud for processing is too long. For example, autonomous vehicles will generate a lot of data from the sensors and cameras that they employ. Moving the data to the cloud to determine if there is a pedestrian in the path of the vehicle would result in disaster, so the data will have to be processed in the vehicle.

Re-Architecting for the Data Era

The limitations with moving large amounts of data will necessitate moving data processing to the edge and connected IoT devices. The limits on network bandwidth, power consumption, and real-time application requirements are forcing this to happen. People and devices connect at the edge and that isn’t going to change, so analytics and storage will have to move out from the cloud. Fortunately, semiconductor technology and firmware capabilities are advancing and the ability to perform advanced processing outside of the cloud is increasing rapidly, making it possible to do the required processing for many applications at the edge or in the IoT devices.

There are additional benefits to processing data at the edge beyond reducing power and improving latencies. Processing at the edge and in IoT devices is scalable. If an application requires more processing, more resources can be applied to it when it is designed. Processing at the edge is also more secure and reliable because the data doesn’t have to be moved across the internet and in many cases will be done inside the endpoint.   

One aspect of processing at the edge is the increasing usage of computational storage – that is processing data inside a storage drive. Data lives in storage, so it makes sense to process it there (Figure 2). This increases security and throughput, reduces the cost and power to move data and supports offline processing. Computational storage will also grow in use in IoT devices. 

Figure 2. Usage of computational storage is growing in SSD

Many of the new capabilities of computational storage are being implemented with artificial intelligence (AI). AI has been around for many years, but it has only been in the past 10 years that we have been able to use this capability in embedded applications. Advancements in memory density, processor performance, and AI algorithms have all contributed to increases in performance and reductions in power consumption that have allowed AI to move from mainframes to embedded applications.

AI is an important solution that will enable us to store data more wisely based on how soon it will be needed and how often it will be updated. AI can be used to predict hot and cold data, to determine where to store data, to manage the lifecycle of data, and to uncover insights in stored data. In storage drives it can be used for object detection and classification or to create metadata (data about the data) to enable search.

AI does not need to stand alone at the edge. It can be used as a first level of processing for data at the edge and then the information moved to the cloud for any heavy lifting that needs to be done. This would reduce the amount of data transmitted, improve latencies, and offload processing from the cloud.

AI is Advancing Rapidly

The implementation of AI in embedded applications is being facilitated by advances in microprocessor capabilities combined with current process technologies, enabling processors that offer very small size at performance levels that were unattainable a few years ago. In IoT devices small, low-power CPUs like the DesignWare® ARC® EM processors are being used for AI (Figure 3). This is being facilitated by specialized libraries like the embARC Machine Learning Inference (MLI) library that supports the ARC EM and HS processor families. The library significantly boosts AI performance increasing it by as much as 16 times for a 2D convolution and up to 5X for a wide range of recurrent neural network (RNN) topologies. With the MLI Library, ARC EM can be used to support a broad range of AI application and do so at very low power consumption levels. 

Figure 3. Processing steps using ARC EM for AI in IoT applications

While processors, like the ARC HS and EM, can be used for AI tasks, specialized processors are available for specific AI tasks, and these offer the highest performance for embedded AI applications. For example, GPUs have been used for machine vision applications, but these are being replaced by newer specialized embedded vision (EV) processors, like Synopsys’ DesignWare ARC EV7x Processors. The ARC EV7x can be configured with a programmable neural network engine to perform AI operations at very high performance levels. Not only are processors increasing in performance but the AI algorithms that run on the processors are also being improved increasing accuracy and reducing memory requirements.

Summary

The amount of data that we generate is growing exponentially and is forecast to increase by 5x to 160 ZB over the next 5 years. In 2025 we will generate 130 ZB more data than we will in 2020, which is more data than we generated from when semiconductors were invented through 2019.  To say that it will be challenging to deal with this increase in data is an understatement.

There are several trends that are driving data growth, on the business side and in the home – for example, autonomous vehicles, augmented reality, machine vision, the internet, and personal assistants are all increasing rapidly because of the convenience they offer.

With most of this data being generated outside of cloud we will have to focus effort on dealing with it where it is being created. Transporting this amount of data from the edge and endpoints of the internet to the cloud will not be possible. It would take too much power, bandwidth, and time. To deal with the growth in data three things will happen. The processing of data will move from the cloud to the edge, and even into the endpoints of the internet. The use of computational storage will increase to deal with the growth of storage capacity and the need to process data at the edge. AI will be an integral part of the solution and will grow in usage at the edge, in the endpoints of the internet and in storage drives to intelligently manage data, and how and where it is processed. The coming data explosion will be a challenge, but one that we can deal with if we intelligently manage the flow of data and where it is processed and stored.

 

For more information visit: Synopsys ARC Processor IP Solutions