How AI in Edge Computing Drives 5G and the IoT

Ron Lowman, Product Marketing Manager, Synopsys

Edge computing, which is the concept of processing and analyzing data in servers closer to the applications they serve, is growing in popularity and opening new markets for established telecom providers, semiconductor startups, and new software ecosystems. It’s brilliant how technology has come together over the last several decades to enable this new space starting with Big Data and the idea that with lots of information, now stored in mega sized data centers, we can analyze the chaos in the world to provide new value to consumers. Combine this concept with IoT, and connected everything, from coffee cups to pill dispensers, oil refineries to paper mills, smart goggles to watches, and the value to the consumer could be infinite.

However, many argue the market didn’t experience the hockey stick growth curves expected for the Internet of Things. The connectivity of the IoT simply didn’t bring enough consumer value, except for specific niches. Over the past 5 years however, technology advancements as artificial intelligence (AI) has begun to revolutionize industries and the concepts of the amount of value that connectivity can provide to consumers. It’s a very exciting time as the market can see unlimited potential in the combination of big data, IoT, and AI, but we are only at the beginning of a long road. One of the initial developments that helps harness the combination is the concept of edge computing and its impact on future technology roadmaps.

The concept of edge computing may not be revolutionary, but the implementations will be. These implementations will solve many growing issues including reducing energy use by large data centers, improving security of private data, enabling failsafe solutions, reducing information storage and communication costs, and creating new applications via lower latency capabilities.

But what is edge computing? How is it used, and what benefits can it provide to a network? To understand edge computing, we need to understand what is driving its development, the types of edge computing applications, and how companies are building and deploying edge computing SoCs today.

Edge Computing, Edge Cloud, Fog Computing, Enterprise

There are many terms for edge computing, including “edge cloud computing” and “fog computing”. Edge computing is typically described as the concept of an application running on a local server in an effort to move cloud processes closer to the end device.

“Enterprise computing” has traditionally been used in a similar way as edge computing but more accurately describes the networking capabilities and not necessarily the location of the computing. Fog computing, coined by Cisco, is basically the same as edge computing although there are many who delineate the fog either above or below the edge computing space or even as a subset of edge computing.

For reference, end point devices and end points are often referred to as “edge devices,” not to be confused with edge computing, and this demarcation is important for our discussion. Edge computing can take many forms, including small aggregators, local on-premise servers, or micro data centers. Micro data centers can be regionally distributed in permanent or even movable storage containers that strap onto 18-wheel trucks.

Value of Edge Computing

Traditionally, sensors, cameras, microphones, and an array of different IoT and mobile devices collect data from their locations and send the data to a centralized data center or cloud.

By 2020, more than 50 billion1 smart devices will be connected worldwide. These devices will generate zettabytes (ZB) of data annually growing to more than 150 ZB by 2025.  

The backbone of the Internet was built to reliably connect devices to each other and to the cloud, helping ensure that the packets get to their destination.

However, sending all this data to the cloud poses several immense problems. First, the 150ZB of data will create capacity issues. Second, it is costly to transmit that much data from its location of origin to centralized data centers in terms of energy, bandwidth, and compute power. Estimates project that only 12% of current data is even analyzed by the companies that own it and only 3% of that data contributes to any meaningful outcomes (that’s 97% of data that was collected and transmitted, wasted, for us “environmental mathematicians”). This clearly outlines operational efficiency issues that need addressed. Third, the power consumption of storing, transmitting and analyzing data is enormous, and finding an effective way to reduce that cost and waste is clearly needed. Introducing edge computing to store data locally reduces transmission costs; however, efficiency techniques are also required to remove data waste, and the predominant method today is to look to AI capabilities. Therefore, most local servers across all applications are adding AI capabilities, and the predominate infrastructure now being installed are new, low-power edge computing server CPUs with connectivity to AI acceleration SoCs, in the form of GPUs and ASICs or an array of these chips.

In addition to addressing capacity, energy, and cost problems, edge computing also enables network reliability as applications can continue to function during widespread network outages. And security is potentially improved by eliminating some threat profiles such as global data center denial of service (DoS) attacks.

Finally, one of the most important aspects of edge computing is the ability to provide low latency for real time use cases such as virtual reality arcades and mobile device video caching. Cutting latency will generate new services, enabling devices to provide many innovative applications in autonomous vehicles, gaming platforms, or challenging, fast-paced manufacturing environments.­­­­­­­­­­­­­­

By processing incoming data at the edge, less information needs to be sent to the cloud and back. This also significantly reduces processing latency. A good analogy would be a popular pizza restaurant that opens smaller branches in more neighborhoods, since a pie baked at the main location would get cold on its way to a distant customer.”

Michael Clegg | Vice President and General Manager of IoT and Embedded | Supermicro

Applications Driving Edge Computing

One of the most vocal drivers of edge computing is 5G infrastructure. 5G telecom providers see an opportunity to provide services on top of their infrastructure. In addition to traditional data and voice connectivity, 5G telecom providers are building the ecosystem to host unique, local applications. By putting servers next to all of their base stations, cellular providers can open up their networks to third parties’ host applications, thereby improving both bandwidth and latency.

Streaming services like Netflix, through their Netflix Open Connect program2, have worked for years with local ISPs to host high traffic content closer to users. With 5G’s Multi-Access Edge Compute (MEC) initiatives, telecom providers see opportunity to deliver similar services for streaming content, gaming, and future new applications. The telecom providers believe they can open this capability to everyone as a paid service, enabling anyone that needs lower latency to pay a premium for locating applications at the edge rather than in the cloud.

Credence Research believes by 2026 the overall edge computing market will be around $9.6B. By comparison, the Research and Markets analysis sees the Mobile Edge Computing market growing from a few hundred million dollars today to over $2.77B by 2026. Although telecoms are the most vocal and likely the fastest growth engines, they are estimated to make up only about one-third of the total market for edge computing. This is because web scale, industrial, and enterprise conglomerates will also provide edge computing hardware, software, and services for their traditional markets that expect edge computing will also open opportunities for new applications.

Popular fast food restaurants are moving towards more automated kitchens to ensure food quality, reduce employee training, increase operational efficiencies, and ensure customer experiences meet expectations. Chick-fil-A is a fast food chain that successfully uses on-premise servers to aggregate hundreds of sensors and controls with relatively inexpensive local equipment that runs locally to protect against any network outages. This was outlined in a 2018 Chick-Fil-A blog3 claiming that “By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.” The blog went on to outline that many restaurants can now handle 3x the amount of business that was originally planned due to the help of Edge Computing.

Overall, a successful edge computing infrastructure requires a combination of local server compute capabilities, AI compute capabilities, and connectivity to mobile/automotive/IoT computing systems (Figure 1).

Figure 1: Edge computing moves cloud processes closer to end devices by using micro data centers to analyze and process data.

“As the Internet of Things (IoT) connects more and more devices, networks are transitioning from being primarily highways to and from a central location to something akin to a spider’s web of interconnected, intermediate storage and processing devices. Edge computing is the practice of capturing, storing, processing and analyzing data near the client, where the data is generated, instead of in a centralized data-processing warehouse. Hence, the data is stored at intermediate points at the ‘edge’ of the network, rather than always at the central server or data center.”

Dr. James Stanger | Chief Technology Evangelist | CompTIA

Use Case for Edge Computing – Microsoft HoloLens

To understand the latency benefits of using edge computing, Rutgers University and Inria analyzed the scalability and performance of edge computing (or, as they call it, “edge cloud”) using the Microsoft HoloLens4.

In the use case, the HoloLens read a barcode scanner and then used scene segmentation in a building to navigate the user to a specific room with arrows displayed on the Hololens. The process used both small data packets of mapping coordinates and larger packets of continuous video to verify the latency improvements of edge computing vs traditional cloud computing. The HoloLens initially read a QR Code, sending the mapping coordinates data to the edge server, which used 4 bytes plus the header and took 1.2 milliseconds (ms). The server found the coordinates and notified the user what the location was, for a total of 16.22 ms . If you sent the same packet of data to the cloud, it would take approximately 80 ms (Figure 2).

Figure 2: Comparing latency for edge device to cloud server vs edge device to edge cloud server.

Similarly, they tested the latency when using OpenCV to do scene segmentation to navigate the user of the Hololens to an appropriate location. The HoloLens streamed video at 30 fps, with the image processed in the edge compute server on an Intel i7 CPU at 3.33 GHz with 15GB RAM. Streaming the data to the edge compute server took 4.9 ms. Processing OpenCV images took an additional 37 ms, for a total of 47.7ms. The same process on a cloud server took closer to 115 ms, showing a clear benefit of edge computing for reduced latency.

This case study shows the significant benefit in latency for edge computing, but there is so much new technology that will better enable low latency in the future.

5G outlines use cases with less than 1ms latency today (Figure 3) and 6G is already discussing reducing that to 10s of microseconds (µs). 5G and Wi-Fi 6 are increasing the bandwidth for connectivity. 5G intends to increase up to 10Gbps and Wi-Fi 6 already supports 2Gbps. AI accelerators claim scene segmentation in less than 20µs which is a significant improvement from the quoted Intel i7 CPU processing each frame in about 20ms in the example technical paper described above.

Figure 3: Bandwidth improvements up to 10Gbps, compared to 10s and 100s of Msps in Figure 2, from Hololens to router and router to edge server combined with AI processing improvements (20ms to 20us) enable roundtrip latency <1ms.

Clearly if edge computing shows benefits over cloud computing, wouldn’t moving computing all the way into the edge devices be the optimal solution? Unfortunately, not for all applications today (Figure 4). In the HoloLens case study, the data uses an SQL database that would be too large to store in the headset. Today’s edge devices, especially devices that are physically worn, don’t have enough compute power to process large datasets. In addition to the compute power, software in the cloud or on edge servers is less expensive to develop than software for edge devices because cloud/edge software does not need to be compressed into smaller memory resources and compute resources.

Figure 4: Comparing cloud and edge computing with endpoint devices.

Because certain applications run ideally based on the compute capabilities, storage capabilities, memory availability, and latency capabilities of different locations of our infrastructure be it in the cloud, in a edge server or in an edge device there is a trend to support future hybrid computing capabilities (Figure 5). Edge computing is the initial establishment of a hybrid computing infrastructure throughout the world.

Figure 5: AI installed at Hololens, at edge server, and in the cloud enable hybrid computing architectures optimize compute, memory, and storage resources based on application needs.

Understanding Edge Computing Segments

Edge computing is about computing locations closer to the application than the cloud. However, is that 300 miles, 3 miles or 300 feet? In the world of computing, the cloud theoretically has infinite memory and infinite compute power. At the device, there is theoretically just enough compute and memory resources to capture and send data to the cloud. Both theoreticals are a bit beyond reality but let’s use this as a method to describe the different levels of edge compute. As the cloud computing resources get closer to the end point device or application, theoretically, the storage, memory and computing resources become less and less. The power that is consumed by these resources is also lowered. The benefits of moving closer not only lower the power but lower the latency and increase the efficiency.

Three basic edge computing architectures are starting to emerge within the space (Figure 6). First and closest to traditional data centers are regional data centers that are miniature versions of cloud compute farms placed strategically to reduce latency but maintain as much of the compute, storage and memory needed. Many companies and startups address this space but SoCs designed specifically to address regional data centers do little to differentiate from classic cloud computing solutions today, which focus on high-performance computing (HPC).

Local servers and on-premise servers, the second edge computing segment, are where many SoC solutions address the power consumption and connectivity needs of edge computing specifically. There is also a large commercialized development on software today, in particular with the adoption of more flexible platforms that enable containers such as Dockers and Kubernetes. Kubernetes is used in the Chick-Fil-A example described earlier. The most interesting piece of the on-premise server segment with respect to semiconductor vendors are the advent of introducing a chipset adjacent to the server SoC to handle the AI acceleration needed. Clearly an AI accelerator is located in the compute farms in the cloud, but a slightly different class of AI accelerator is built for the edge servers because this is where the market is expected to grow and there is opportunity to capture a foothold in this promising space.

A third segment for edge computing includes aggregators and gateways that are intended to perform limited functions, maybe only running one or a few applications with the lowest latency possible and with minimal power consumption.

Each of these three segments have been defined supporting real world applications. For instance, McKinsey has identified over 107 use cases in their analysis of edge computing4. ETSI, via their Group Specification MES 002 v.2.1.1, has defined over 35 use cases for 5G MEC including for gaming, service level agreements, video caching, virtual reality, traffic deduplication, and much more. Each of these applications have some predefined latency requirements based on where in the infrastructure the edge servers may exist. The OpenStack Foundation is another organization that has incorporated Edge Computing into their efforts with Central Office ReArchitected as a Data Center (CORD) latency expectations where traditional telecom offices distributed throughout networks are now hosting edge cloud servers.

The 5G market expects use cases as low as 1ms latency roundtrip, from the edge device, to the edge server, back to the edge device. The only way to achieve this is through a local gateway or aggregator, as going all the way to the cloud typically takes 100ms. The 6G initiative, which was introduced in the fall of 2019, announced the goal for 10s of µS latency.

Each of the edge computing systems support a similar architecture of SoCs that include a networking SoC, some storage, a server SoC, and now an AI accelerator or array of AI accelerators. Each type of system offers its own levels of latency, power consumption, and performance. General guidelines for these systems are described in Figure X. The market is changing and these numbers will likely move quickly as the technology advances.

Figure 6: Comparing the three main SoC architectures for edge computing: Regional data centers/edge cloud; on-premise servers/local servers; and aggregators/gateways/access.

How is Edge Computing Impacting Server System SoCs?

The primary goal of many of the edge computing applications is around new services related to lower latency. To support lower latency, many new systems are adopting some of the latest industry interface standards including PCIe 5.0, LPDDR5, DDR5, HBM2e, USB 3.2, CXL, PCIe-based NVMe, and other next-generation standards based technologies. Each of these technologies provide lower latency via bandwidth improvements when compared to previous generations.

Even more pronounced than the drive to reduce latency is the addition of AI acceleration to all of these edge computing systems. AI acceleration is provided by some server chips with new instructions such as the x86 extension AVX-512 Vector Neural Network Instructions (AVX512 VNNI). Many times, this additional instruction set is not enough to provide the low latency and low power implementations needed for anticipated tasks, so custom AI accelerators are added to most new systems. The connectivity required for these chips are commonly adopting the highest bandwidth host to accelerator connectivity possible. For example, use of PCIe 5.0 is rapidly expanding today due to these bandwidth requirements which directly impact latency, most commonly in some sort of switching configuration with multiple AI accelerators.

CXL is another interface that is gaining momentum as it was built specifically to lower latency and provide cache coherency. Cache coherency can be important due to the heterogenous compute needs and extensive memory requirements of AI algorithms.

Beyond the local gateways and aggregator server systems, a single AI accelerator typically does not provide enough performance, so scaling these accelerators is required with very high bandwidth chip-to-chip SerDes PHYs. The latest released PHYs support 56G and 112G connections. Chip-to-chip requirements to support scaling of AI has seen many different implementations. Ethernet may be one option to scale in a standards-based implementation and a few solutions are offered today with this concept. However, many implementations today leverage the highest bandwidth SerDes possible with proprietary controllers. The differing architectures may change future SoC architectures of server systems to incorporate the networking, the server, the AI, and the storage components in more integrated SoCs vs 4 distinct SoCs that are being implemented today.

Figure 7: Common server SoC found at the edge with variability of number of processors, Ethernet throughput and storage capability based on number of tasks, power, latency and other needs.

The AI algorithms are pushing the limits with respect to memory bandwidth requirements. To give an example, the latest BERT and GPT-2 models require 345M and 1.5B parameters respectively. Clearly high capacity memory capabilities are needed to host these as well as the many complex applications that are intended to perform in the edge cloud. To support this capacity, designers are adopting DDR5 for new chipsets. In addition to the capacity challenges, the AI algorithms’ coefficients need accessed for the massive amount of multiple accumulate calculations done in parallel in non-linear sequences. Therefore, HBM2e is one of the latest technologies that is seeing rapid adoption with many instantiations per die.

Figure 8: Common AI SoC with high speed, high bandwidth, memory, host to accelerator, and high speed Die to Die interfaces for scaling multiple AI accelerators.

The Moving Targets and the Segmentation of Edge Computing

If we take a closer look at the different types of needs for edge computing we will see the regional data centers, local servers, and aggregation gateways have different compute, latency, and power needs. Future requirements are clearly focused on lowering the latency of the round trip response, lowering the power of the specific edge application and ensuring there is enough processing capabilities to handle the specific tasks.

Power consumed by the servers SoCs differs based on the latency and processing requirements. Next-generation solutions will not only lower latency and lower power, but also include AI capabilities, in particular AI accelerators. The performance of these AI accelerators also changes based on the scaling of these needs.

It is evident, however, that AI and edge computing requirements are rapidly changing and many of the solutions we see today have progressed multiple times over the past 2 years and will continue to do so. Today’s performance can be categorized but the numbers will continue to move, increasing performance, decreasing power, and lowering overall latency.

Figure 9: The next generation of server SoCs and the addition of AI accelerators will make edge computing even faster.

Conclusion

Edge computing is a very important aspect of enabling faster connectivity. It will bring cloud services closer to the edge devices. It will lower latency and provide new applications and services to consumers. It will proliferate AI capabilities, moving them out of the cloud. And it will be the basic technology that enables future hybrid computing where computing decisions can be made real time locally, in the cloud or at the device based on latency needs, power needs and overall storage and performance needs.